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Abstract 

This paper examines a general class of inferential problems in semiparametric 
and nonparametric models defined by conditional moment restrictions. We con¬ 
struct tests for the hypothesis that at least one element of the identified set satisfies 
a conjectured (Banach space) “equality” and/or (a Banach lattice) “inequality” con¬ 
straint. Our procedure is applicable to identified and partially identified models, and 
is shown to control the level, and under some conditions the size, asymptotically uni¬ 
formly in an appropriate class of distributions. The critical values are obtained by 
building a strong approximation to the statistic and then bootstrapping a (conser¬ 
vatively) relaxed form of the statistic. Sufficient conditions are provided, including 
strong approximations using Koltchinskii’s coupling. 

Leading important special cases encompassed by the framework we study in¬ 
clude: (i) Tests of shape restrictions for infinite dimensional parameters; (ii) Confi¬ 
dence regions for functionals that impose shape restrictions on the underlying pa¬ 
rameter; (iii) Inference for functionals in semiparametric and nonparametric models 
defined by conditional moment (in)equalities; and (iv) Uniform inference in possibly 
nonlinear and severely ill-posed problems. 

Keywords: Shape restrictions, inference on functionals, conditional moment 
(in)equality restrictions, instrumental variables, nonparametric and semiparametric 
models, Banach space, Banach lattice, Koltchinskii coupling. 
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1 Introduction 


Nonparametric constraints, often called shape restrictions, have played a central role 
in economics as both testable implications of classical theory and sufficient conditions 
for obtaining informative counterfactual predictions (Topkis, 1998). A long tradition 
in applied and theoretical econometrics has as a result studied shape restrictions, their 
ability to aid in identification, estimation, and inference, and the possibility of testing for 
their validity (Matzkin, 1994). The canonical example of this interplay between theory 
and practice is undoubtedly consumer demand analysis, where theoretical predictions 
such as Slutsky symmetry have been extensively tested for and exploited in estimation 
(Hausman and Newey, 1995; Blundell et ah, 2012). The empirical analysis of shape 
restrictions, however, goes well beyond this important application with recent examples 
including studies into the nronotonicity of the state price density (Jackwerth, 2000; Ait- 
Sahalia and Duarte, 2003), the presence of ramp-up and start-up costs (Wolak, 2007; 
Reguant, 2014), and the existence of complementarities in demand (Gentzkow, 2007) 
and organizational design (Athey and Stern, 1998; Kretschmer et ah, 2012). 

Despite the importance of nonparametric constraints, their theoretical study has 
focused on a limited set of models and restrictions - a limitation that has resulted 
in practitioners often facing parametric modeling as their sole option. In this paper, 
we address this gap in the literature by developing a framework for testing general 
shape restrictions and exploiting them for inference in a widespread class of conditional 
moment restriction models. Specifically, we study nonparametric constraints in settings 
where the parameter of interest 6q 6 0 satisfies J conditional moment restrictions 

E P \p 3 (Xi,0o)\Zij] = 0 for 1 <j<J (1) 

with p 3 : R d * x 0 -> R possibly non-snrooth functions, 6 R d;c , Z^ 3 € R dz J, and P 
denoting the distribution of (A*, )■ As shown by Ai and Chen (2007, 2012), un¬ 

der appropriate choices of the parameter space and moment restrictions, this model 
encompasses parametric (Hansen, 1982), semiparametric (Ai and Chen, 2003), and 
nonparametric (Newey and Powell, 2003) specifications, as well as panel data appli¬ 
cations (Chamberlain, 1992) and the study of plug-in functionals. By incorporating 
nuisance parameters into the definition of the parameter space, it is in fact also possi¬ 
ble to view conditional moment (in)equality models as a special case of the specifica¬ 
tion we study. For example, the restriction Ep[p(X t , 9)\Z t ] < 0 may be rewritten as 
Ep[p(X t , 9) + X(Zi)\Zi] = 0 for some unknown positive function A, which fits (1) with 
9 = (9, A) and A subject to the constraint A(Zj) > 0; see Example 2.4 below. 

While in multiple applications identification of 6q £ 0 is straightforward to establish, 
there also exist specifications of the model we examine for which identification can be 
uncertain (Canay et al., 2013; Chen et ah, 2014). In order for our framework to be 
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robust to a possible lack of identification, we therefore define the identified set 

0o(P) = {e € 0 : Ep[ Pj (Xi, e)\Z ii3 \ = 0 for 1 < j < J} (2) 

and employ it as the basis of our statistical analysis. Formally, for a set R of parameters 
satisfying a conjectured restriction, we develop a test for the hypothesis 

H 0 : 0 O (P) 0 R / 0 Hi : 0 O (P) H i? = 0 ; (3) 

i.e. we device a test of whether at least one element of the identified set satisfies the 
posited constraints. In an identified model, a test of (3) is thus equivalent to a test of 
whether 9q satisfies the hypothesized constraint. The set R, for example, may constitute 
the set of functions satisfying a conjectured shape restriction, in which case a test of 
(3) corresponds to a test of the validity of such shape restriction. Alternatively, the 
set R may consist of the functions that satisfy an assumed shape restriction and for 
which a functional of interest takes a prescribed value — in which case test of inversion 
of (3) yields a confidence region for the value of the desired functional that imposes the 
assumed shape restriction on the underlying parameter. 

The wide class of hypotheses with which we are concerned necessitates the sets R to 
be sufficiently general, yet be endowed with enough structure to ensure a fruitful asymp¬ 
totic analysis. An important insight of this paper is that this simultaneous flexibility 
and structure is possessed by sets defined by “equality” restrictions on Banach space 
valued maps, and “inequality” restrictions on Abstract M (AM) space valued maps (an 
AM space is a Banach lattice whose norm obeys a particular condition). We illustrate 
the generality granted by these sets by showing they enable us to employ tests of (3) 
to: (i) Conduct inference on the level of a demand function while imposing a Slutsky 
constraint; (ii) Construct a confidence interval in a regression discontinuity design where 
the conditional mean is known to be monotone in a neighborhood of, but not necessarily 
at, the discontinuity point; (iii) Test for the presence of complementarities in demand; 
and (iv) Conduct inference in semiparametric conditional moment (in)equality models. 
Additionally, while we do not pursue further examples in detail for conciseness, we note 
such sets R also allow for tests of homogeneity, supermodularity, and economies of scale 
or scope, as well as for inference on functionals of the identified set. 

As our test statistic, we employ the minimum of a suitable criterion function over 
parameters satisfying the hypothesized restriction - an approach sometimes referred to 
as a sieve generalized method of moments J-test. Under appropriate conditions, we 
show that the distribution of the proposed statistic can be approximated by the law of 
the projection of a Gaussian process onto the image of the local parameter space under 
a linear map. In settings where the local parameter space is asymptotically linear and 
1 Due to their uncommon use in econometrics, we overview AM spaces in Appendix A. 
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the model is identified, the derived approximation can reduce to a standard chi-squared 
distribution as in Hansen (1982). However, in the presence of “binding” shape restric¬ 
tions the local parameter space is often not asymptotically linear resulting in non-pivotal 
and potentially unreliable pointwise (in P) asymptotic approximations (Andrews, 2000, 
2001). We address these challenges by projecting a bootstrapped version of the relevant 
Gaussian process into the image of an appropriate sample analogue of the local param¬ 
eter space under an estimated linear map. Specifically, we establish that the resulting 
critical values provide asymptotic size control uniformly over a class of underlying dis¬ 
tributions P. In addition, we characterize a set of alternatives for which the proposed 
test possesses nontrivial local power. While aspects of our analysis are specific to the 
conditional moment restriction model, the role of the local parameter space is solely 
dictated by the set R. As such, we expect the insights of our arguments to be applicable 
to the study of shape restrictions in alternative models as well. 

The literature on nonparametric shape restrictions in econometrics has classically fo¬ 
cused on testing whether conditional mean regressions satisfy the restrictions implied by 
consumer demand theory; see Lewbel (1995), Haag et al. (2009), and references therein. 
The related problem of studying monotone conditional mean regressions has also gar¬ 
nered widespread attention - recent advances on this problem includes Chetverikov 
(2012) and Chatterjee et al. (2013). Chernozhukov et al. (2009) propose generic meth¬ 
ods, based on rearrangement and/or projection operators, that convert function esti¬ 
mators and confidence bands into monotone estimators and confidence bands, provably 
delivering finite-sample improvements; see Evdokimov (2010) for an application in the 
context of structural heterogeneity models. Additional work concerning monotonicity 
constraints includes Beare and Schmidt (2014) who test the monotonicity of the pric¬ 
ing kernel, Chetverikov and Wilhelm (2014) who study estimation of a nonparametric 
instrumental variable regression under monotonicity constraints, and Armstrong (2015) 
who develops minimax rate optimal one sided tests in a Gaussian regression discon¬ 
tinuity design. In related work, Freyberger and Horowitz (2012) examine the role of 
monotonicity and concavity or convexity constraints in a nonparametric instrumental 
variable regression with discrete instruments and endogenous variables. Our paper also 
contributes to a literature studying semiparametric and nonparametric models under 
partial identification (Manski, 2003). Examples of such work include Chen et al. (2011a), 
Chernozhukov et al. (2013), Hong (2011), Santos (2012), and Tao (2014) for conditional 
moment restriction models, and Chen et al. (2011b) for the maximum likelihood setting. 

The remainder of the paper is organized as follows. In Section 2 we formally define 
the sets of restrictions we study and discuss examples that fall within their scope. In 
turn, in Section 3 we introduce our test statistic and basic notation that we employ 
throughout the paper. Section 4 obtains a rate of convergence for set estimators in 
conditional moment restriction models that we require for our subsequent analysis. Our 
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main results are contained in Sections 5 and 6, which respectively characterize and 
estimate the asymptotic distribution of our test statistic. Finally, Section 7 presents 
a brief simulation study, while Section 8 concludes. All mathematical derivations are 
included in a series of appendices; see in particular Appendix A for an overview of AM 
spaces and an outline of how Appendices B through H are organized. 


2 The Hypothesis 

In this section, we formally introduce the set of null hypotheses we examine as well as 
motivating examples that fall within their scope. 

2.1 The Restriction Set 

The defining elements determining the generality of the hypotheses allowed for in (3) are 
the choice of parameter space 0 and the set of restrictions embodied by R. In imposing 
restrictions on both 0 and R we aim to allow for as general a framework as possible while 
simultaneously ensuring enough structure for a fruitful asymptotic analysis. To this end, 
we require the parameter space 0 to be a subset of a Banach space B, and consider 
sets R that are defined through “equality” and “inequality” restrictions. Specifically, 
for known maps Tp and T q, we impose that the set R be of the form 

R = {6 E B : T F (9) = 0 and T G (0) < 0} . (4) 

In order to allow for hypotheses that potentially concern global properties of 9, such 
as shape restrictions, the maps Tp : B —> F and Tq ■ B —> G are also assumed to take 
values on general Banach spaces F and G respectively. While no further structure on 
F is needed for testing “equality” restrictions, the analysis of “inequality” restrictions 
necessitates that G be equipped with a partial ordering - i.e. that “<” be well defined 
in (4). We thus impose the following requirements on 0, and the maps T p and Tq- 

Assumption 2.1. (i) 0 C B, where B is a Banach space with metric || ■ ||b- 

Assumption 2.2. (i) Tp : B — > F and Tq : B — > G, where F is a Banach space with 
metric || • ||p, and G is an AM space with order unit 1g and metric || ■ ||g; (H) The 
maps Tp : B —> F and Tq ■ B — > G are continuous under || • 11b - 

Assumption 2.1 formalizes the requirement that the parameter space 0 be a subset 
of a Banach space B. In turn, Assumption 2.2(i) similarly imposes that T p take values 
in a Banach space F, while the map Tq is required to take values in an AM space G - 
since AM spaces are not often used in econometrics, we provide an overview in Appendix 
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A. Heuristically, the essential implications of Assumption 2.2(i) for G are that: (i) G 
is a vector space equipped with a partial order relationship “<”; (ii) The partial order 
“<” and the vector space operations interact in the same manner they do on R; 2 and 
(iii) The order unit 1 g £ G is an element such that for any 6 £ G there exists a scalar 
A > 0 satisfying |0| < AIg; see Remark 2.1 for an example. Finally, we note that in 
Assumption 2.2(h) the maps T p and T q are required to be continuous, which ensures 
that the set R is closed in B. Since the choice of maps T p and Tq is dictated by the 
hypothesis of interest, verifying Assumption 2.2(h) is often accomplished by restricting 
B to have a sufficiently “strong” norm that ensures continuity. 


Remark 2.1. In applications we will often work with the space of continuous functions 
with bounded derivatives. Formally, for a set A C R rf , a function / : A —> R, a vector 
of positive integers a = (aq, ..., ad), and |ck| = Yli=i a i we denote 


D a f(ao) 


d\ a \ 


da«\...,da a / 


f(a) 


a=ao 


(5) 


For a nonnegative integer m, we may then define the space C m (A ) to be given by 


C m (A) = {/ : D a f is continuous and bounded on A for all |a| < m} , (6) 

which we endow with the metric ||/|| m ,oo = max| Q | <m sup agj4 \D a f{a)\. The space C°(A) 
with norm ||/||o,oc - which we denote C{A) and ||-||oo for simplicity - is then an AM space. 
In particular, equipping C(A) with the ordering f\ < /2 if and only if /i(o) < / 2 (a) for 
all a £ A implies the constant function 1(a) = 1 for all a £ A is an order unit. ■ 


2.2 Motivating Examples 

In order to illustrate the relevance of the introduced framework, we next discuss a 
number of applications based on well known models. For conciseness, we keep the 
discussion brief and revisit these examples in more detail in Appendix F. 

We draw our first example from a long-standing literature aiming to replace para¬ 
metric assumptions with shape restrictions implied by economic theory (Matzkin, 1994). 

Example 2.1. (Shape Restricted Demand). Blundell et al. (2012) examine a semi- 
parametric model for gasoline demand, in which quantity demanded Qi given price Pi, 
income Y), and demographic characteristics IV,; £ R d ™ is assumed to satisfy 


Qi = g 0 (Pi,Y l ) + Who + U i . (7) 

2 For example, if 61 < 62, then 9 \ + 63 < 62 + 83 for any 63 £ G. 
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The authors propose a kernel estimator for the function go : R:j_ —y R under the as¬ 
sumption E[Ui | P t , Y t . Wj] = 0 and the hypothesis that go obeys the Slutsky restriction 


d_ 

dp 


d 

9o{p,y) + go{p,y)g^go(Piy) < o . 


( 8 ) 


While in their application Blundell et al. (2012) find imposing (8) to be empirically 
important, their asymptotic framework assumes (8) holds strictly and thus implies the 
constrained an unconstrained estimators are asymptotically equivalent. In contrast, our 
results will enable us to test, for example, for (po , yo) E R+ and cq E R the hypothesis 


Ho ■ go{Po , 2/o) = co Hi: g 0 (po , yo) / c 0 


(9) 


employing an asymptotic analysis that is able to capture the finite sample importance 
of imposing the Slutsky restriction. To map this problem into our framework, we set 
B = C\ R2) x R d -, J = 1, Zi = ( Pi,Y u Wi ), Xi = ( Qi,Zi ) and p{X u 6) = Qi - 
g(Pi,Wi ) — W/7 for any 6 = (g, 7) E B. Letting F = R and defining Tp : B —y F by 
T p{0) = g(po,yo) — Co for any 6 E B enables us to test (9), while the Slutsky restriction 
can be imposed by setting G = C'(R^) and defining Tq '■ B —> G to be given by 

d d 

T G ( 0 )(p,y) = g^g(p,y) + g(p,y)g-g(p,y) (10) 

for any 6 E B. Alternatively, we may also conduct inference on deadweight loss as 
considered in Blundell et al. (2012) building on Hausman and Newey (1995), or allow 
for endogeneity and quantile restrictions on Ui as pursued by Blundell et al. (2013). ■ 


Our next example builds on Example 2.1 by illustrating how to exploit shape re¬ 
strictions in a regression discontinuity (RD) setting; see also Armstrong (2015).'^ 

Example 2.2. (Monotonic RD). We consider a sharp design in which treatment is 
assigned whenever a forcing variable R{ E R is above a threshold which we normalize 
to zero. For an outcome variable Yj and a treatment indicator Di = l{Ri > 0}, Hahn 
et al. (2001) showed the average treatment effect to at zero is identified by 

r 0 = lim E[Yi\Ri = r] - limE[Y)|i?j = r] . (11) 

rj.0 rfO 

In a number of applications it is additionally reasonable to assume E[Yi\Ri = r\ is 
monotonic in a neighborhood of, but not necessarily at, zero. Such restriction is natural, 
for instance, in Lee et al. (2004) where Ri is the democratic vote share and L) is a measure 
of how liberal the elected official’s voting record is, or in Black et al. (2007) where Ri and 
Yi are respectively measures of predicted and actual collected unemployment benefits. 
3 We thank Pat Kline for suggesting this example. 
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In order to illustrate the applicability of our framework to this setting, we suppose we 
wish to impose monotonicity of E[Yi\R = r] on r E [—1,0) and r E [0,1] while testing 

Ho : ro = 0 H\ : to ^ 0 . (12) 

To this end, we let B = C' 1 ([— 1,0]) x C 1 ([0,1]), Xj = (!}, Ri, Di), Z{ = (. Ri,Di ), and 
for J = 1 set p{x,9 ) = y — g~(r)( 1 — d) — g+(r)d which yields the restriction 

E[Yi - g_(Ri)( 1 - Di) - g + (Ri)Di\Ri,Di\ = 0 (13) 

for any 0 = (g~,g~ f) E B. The functions g- and g+ are then respectively identified by 
E[Yi\Ri = r] for r E [—1,0) and r E [0,1], and hence we may test (12) by setting F = R 
and T p(Q) = g+(0) — <?_(0) for any 6 = ( g~,g+ ) E B. In turn, monotonicity can be 
imposed by setting G = C([— 1,0]) x C([0,1]) and letting T q{0) = (—gL, —g+) for any 
6 = (g-,g .|_) E B. A similar construction can also be applied in fuzzy RD designs or the 
regression kink design studied in Card et al. (2012) and Calonico et al. (2014). ■ 

While Examples 2.1 and 2.2 concern imposing shape restriction to conduct inference 
on functionals, in certain applications interest instead lies on the shape restriction itself. 
The following example is based on a model originally employed by Gentzkow (2007) in 
examining whether print and online newspapers are substitutes or complements. 

Example 2.3. (Complementarities). Suppose an agent can buy at most one each 
of two goods j E {1,2}, and let a = (01,02) E {(0, 0), (1,0), (0,1), (1,1)} denote the 
possible bundles to be purchased. We consider a random utility model 

2 

U(a, Zi, ei) = YjWhoj + eij)l{aj = 1} + <5 0 (li)l{oi = l,o 2 = 1} (14) 

i=i 

where Z. t = {W t . Y) are observed covariates, 1) E can be a subvector of W % E R rf “, 
<5o E C(R rfy ) is an unknown function, and €4 = (e^i, ej )2 ) follows a parametric distribution 
G(-|ao) with ao £ R d “; see Fox and Lazzati (2014) for identification results. In (14), 
<5o € C(R rfy ) determines whether the goods are complements or substitutes and we may 
consider, for example, a test of the hypothesis that they are always substitutes 

Ho ■ ho (y) < 0 for all y H\ : So(y) > 0 for some y . (15) 

In this instance, B = R 2du,+d “ x C(R dy ), and for any 9 = (71,72,0;, h) E B we map 
(15) into our framework by letting G = C(R dy ), Tq( 6 ) = S, and imposing no equality 

4 Here, with some abuse of notation, we identify g~ £ C([—1,0]) with the function E\Yi\Ri = r] on 
r £ [—1, 0) by letting <?-(0) = linn-fo E[Yi\Ri = r] which exists by assumption. 
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restrictions. For observed choices A, the conditional moment restrictions are then 

P(Ai = ( l,0)\Zi) 

= j l{W/ 7 i + ei > 0, W '72 + 5(57) + e 2 < 0, W /71 + ei > W[ l2 + e 2 }dG{e\a) (16) 
and, exploiting 5(57) < 0 under the null hypothesis, the two additional conditions 

P(Ai = ( 0 , 0 )|^) = J l{ei < -Wi 71 , e 2 < -W' il2 }dG(e\a) (17) 

P(Ai = ( 1 , 1 )| Zi) = J l{ei + 5(17) > -IF/ 71 , e 2 + 5(Yi) >-W' il2 }dG{e\a) (18) 

so that in this model J = 3. An analogous approach may also be employed to conduct 
inference on interaction effects in discrete games as in De Paula and Tang (2012). ■ 

The introduced framework can also be employed to study semiparametric specifi¬ 
cations in conditional moment (in)equality models - thus complementing a literature 
that, with the notable exception of Chernozhukov et al. (2013), has been largely para¬ 
metric (Andrews and Shi, 2013). Our final example illustrates such an application in 
the context of a study of hospital referrals by Ho and Pakes (2014). 

Example 2.4. (Testing Parameter Components in Moment (In)Equalities). 

We consider the problem of estimating how an insurer assigns patients to hospital within 
its network 77. Suppose each observation i consists of two individuals j E {1, 2} of similar 
characteristics for whom we know the hospital Hij E 77 to which they were referred, 
as well as the cost of treatment Pij(h) and the distance Dij(h) to any hospital h E 77. 
Under certain assumptions, Ho and Pakes (2014) then derive the moment restriction 

2 

El^iloiPijiHij) - PijiHiji)) + g 0 (D ij (H ij )) - g 0 (D ij (H ij ,))}\Z i \ < 0 (19) 

3 = 1 

where 70 E R denotes the insurer’s sensitivity to price, go : R+ —> R + is an unknown 
monotonically increasing function reflecting a preference for referring patients to nearby 
hospitals, Z. t E R dl is an appropriate instrument, and j' = {1, 2} \ {j}.° Employing our 
proposed framework we may, for example, test for some cq E R the null hypothesis 


#0 : 7o = co 77i : 70 / c 0 (20) 

without imposing parametric restrictions on go but instead requiring it to be monotone. 
’In other words, j' = 2 when j = 1, and j' = 1 when j = 2. 


9 



To this end, let X t = ({{-F*j(/i), Dij(h)}h e ^, Zi), and define the function 

2 

V>(^,7 ,g) (PiiiHij) - PijiHij')) + g{D ij (H ij )) - g{D ij {H ijl ))} (21) 

3 =1 

for any (7 ,g) E R x C 1 (R+). The moment restriction in (19) can then be rewritten as 

g 0 ) + \ 0 (Zi)\Zi] = 0 (22) 

for some unknown function Ao satisfying Ao (Zi) > 0. Thus, (19) may be viewed as a 
conditional moment restriction model with parameter space B = R x C 1 (R+) x £°° (R dz ) 
in which p(x,9 ) = i/>(x, 7 , < 7 ) + A(z) for any 9 = ( 7 , g, A) E B. The monotonicity 
restriction on g and positivity requirement on A can in turn be imposed by setting 
G = ^°°(R + ) x ^°°(R dz ) and T q( 0) = —(gf, A), while the null hypothesis in (20) may 
be tested by letting F = R and defining T p(9) = 7 — cq for any 9 = ( 7 , g , A) E B. An 
analogous construction can similarly be applied to extend conditional moment inequality 
models with parametric specifications to semiparametric or nonparametric ones ; 6 see 
Ciliberto and Tamer (2009), Pakes (2010) and references therein. ■ 


3 Basic Setup 

Having formally stated the hypotheses we consider, we next develop a test statistic and 
introduce basic notation and assumptions that will be employed throughout the paper. 

3.1 Test Statistic 

We test the null hypothesis in (3) by employing a sieve-GMM statistic that may be 
viewed as a generalization of the overidentification test of Sargan (1958) and Hansen 
(1982). Specifically, for the instrument Z-^j of the j th moment restriction, we consider a 
set of transformations {qk,n,j } k =1 and let ( z j) = (<?i ,n, 3 ( z 3 ), ■■■, ( -lk n . r n,]( z ])) - Setting 
Zi = (Z'j.... ,Z'j)' to equal the vector of all instruments, k n = Y2f=i k n ,j the total 
number of transformations, q^^z) = ( 2 i) / , ■ ■ •, ( zj)')' the vector of all trans¬ 

formations, and p(x, 9) = (pi(x, 9 ),..., pj(x, 9))' the vector of all generalized residuals, 
we then construct for each 9 E 0 the k n x 1 vector of scaled sample moments 

1 n n 

— J2 p(Xi , 9) * <fc{Zi) = -= ^(pr(W, 0)q k n y (Zi^y ,..., pj(Xi, 9)q k n y (Zi'j)')' 

V n 1=1 V n i=\ 

(23) 

6 Alternatively, through test inversion we may employ the framework of this example to construct 
confidence regions for functionals of a semi or non-parametric identified set (Romano and Shaikh, 2008). 
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where for partitioned vectors a and b, a*b denotes their Khatri-Rao product' - i.e. the 

vector in (23) consists of the scaled sample averages of the product of each generalized 

residual pj(Xi,8) with the transformations of its respective instrument Z^j. Clearly, if 

(23) is evaluated at a parameter 8 in the identified set ©o (P), then its mean will be zero. 

As noted by Newey (1985), however, for any fixed dimension k n the expectation of (23) 

may still be zero even if 9 ^ ©o (P). For this reason, we conduct an asymptotic analysis 

k 

in which k n diverges to infinity, and note the choice of transformations {<lk,n,]}k =j is 
allowed to depend on n to accommodate the use of splines or wavelets. 

Intuitively, we test (3) by examining whether there is a parameter 8 £ 0 satisfying 
the hypothesized restrictions and such that (23) has mean zero. To this end, for any 
r > 2, vector a = (a^ 1 ),..., a^)', and d x d positive definite matrix A we define 

d 

||a|U,r = Pa|| r ||a||; = ^|a«r , (24) 

i=1 

with the usual modification ||o||oo = maxi<j<d |a^|. For any possibly random k n x k n 
positive definite matrix S n , we then construct a function Q n : 0 —> R + by 

1 n 

Qn{9) = ||^=^p(X i ,0)*^(^)||^ r . (25) 

V 1 
1=1 

Heuristically, the criterion Q n should diverge to infinity when evaluated at any 8 (f ©o (P) 
and remain “stable” when evaluated at a 0 E ©o (P). We therefore employ the minimum 
of Q n over R to examine whether there exists a 8 that simultaneously makes Q n “stable” 
(■8 e @o (P)) and satisfies the conjectured restriction (0 G R). Formally, we employ 

I n (R)= m£ Q n (6) , (26) 

f/G 

where Q n nR is a sieve for QnR- i.e. Q n P\R is a finite dimensional subset of QnR that 
grows dense in QnR. Since the choice of 0 n nii depends on QnR , we leave it unspecified 
though note common choices include flexible finite dimensional specifications, such as 
splines, polynomials, wavelets, and neural networks; see (Chen, 2007). 

3.2 Notation and Assumptions 
3.2.1 Notation 

Before stating our next set of assumptions, we introduce basic notation that we employ 
throughout the paper. For conciseness, we let V) = (X', Z')' and succinctly refer to the 

' For partitioned vectors a = (a[,..., a!j)' and b — (b[,, a'j )', a* b = ((ai 0 fei)',..., (aj ®bj)')'. 
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set of P G P satisfying the null hypothesis in (3) by employing the notation 

P o = {PGP:0 o (P)n J R^0} . (27) 

We also view any d x d matrix A as a map from R <( to R rf , and note that when R d is 
equipped with the norm || • || r it induces on A the operator norm || ■ || 0i7 . given by 

\\A\\ 0ir = sup ||Aa|| r . (28) 

a£R, d :||a|| r =l 

For instance, ||A|| 0 i corresponds to the largest maximum absolute value column sum of 
A. and ||^4|| 0 ,2 corresponds to the square root of the largest eigenvalue of A'A. 

Our analysis relies heavily on empirical process theory, and we therefore borrow ex¬ 
tensively from the literature’s notation (van der Vaart and Wellner, 1996). In particular, 
for any function / of V) we for brevity sometimes write its expectation as 

Pf = E P [f(Vi )] . (29) 

In turn, we denote the empirical process evaluated at a function / by G n ,p/ - he. set 

1 n 

G n ,p/ = —j= ^2{f(Vi) — p f} ■ (30) 

Jn 

v i =1 

We will often need to evaluate the empirical process at functions generated by the maps 
6 i —y and the sieve 0 n n R, and for convenience we therefore define the set 

T n = {/ 0 j(-, 0) : 6 € Q n n R and 1 < j < J} . (31) 

The “size” of E n plays a crucial role, and we control it through the bracketing integral 

J[\(d,E n , || ■ \\l 2 p ) = J + || • ||pajde , (32) 

where N^(e, J- n , || ■ is the smallest number of brackets of size e (under || ■ \\pE) 
required to cover P n . s Finally, we let W n ^p denote the isonormal process on L 2 p - i.e. 
W U) p is a Gaussian process satisfying for all f,g € L 2 P , E[W n ,pf] = P[W ni pg] = 0 and 

P[W n , P /W n ,p 5 ] = Ep[(f(Vi) - EplfiVtMgiVi) - E P \g{Vi)])] • (33) 

It will prove useful to denote the vector subspace generated by the sieve 0 n n R by 

B n = span{0 n n P} , (34) 

8 An t bracket under || ■ || L 2 is a set of the form {/ £ T n : L(v) < f(v) < U (v)} with ||U — L\\ l i < e. 

Here, as usual, the L p spaces are defined by L q p = {/ : H/H^^ < oo} where ||/||®, = £7>[|/| 8 9 ]. 


12 





where spanjC*} denotes the closure under || • ||b of the linear span of any set CCB. 
Since B n will further be assumed to be finite dimensional, all well defined norms on it 
will be equivalent in the sense that they generate the same topology. Hence, if B n is a 
subspace of two different Banach spaces (Ai, || -1| a x ) and (A 2 , || • ||a 2 )> then the modulus 
of continuity of || • ||ai with respect to || • ||a 2) which we denote by 

5„(Ai,A 2 ) = sup {^{{^- , (35) 

b£B n ||o||a 2 

will be finite for any n though possibly diverging to infinity with the dimension of B n . 
For example, if B n C Up, then 5 n (Lp,Lp) < 1, while 5 n (Lp,Lp) is the smallest 
constant such that ||6 ||l^ < II^IIl^ x S n (Lff,L 2 p ) for all b £ B n . 

3.2.2 Assumptions 

The following assumptions introduce a basic structure we employ throughout the paper. 

Assumption 3.1. (i) is an i.i.d. sequence with V) ~ P £ P. 

Assumption 3.2. (i) For all 1 < j < J, sup 1 < fc < fcnj sup PeP \\qk,n,j\\L™ < B n with 
B n > 1; (ii) The largest eigenvalue of Ep[qn™j J is bounded uniformly in 

1 < j < J, n, and P £ P; (in) The dimension of B n is finite for any n. 

Assumption 3.3. The classes T n : (i) Are closed under || • \\l 2 p I (H) Have envelope F n 
with supp gP E P [F 2 (Vi)] < 00 ; (in) Satisfy sup PeP J[](\\F n \\ L 2 p , P n , || • \\ L 2 p ) < J n . 

Assumption 3.4. (i) For each P £ P there is a £ n (P) > 0 with ||S n — S n (P)|| 0)J . = 
o p (l) uniformly in P £ P; (ii) The matrices T, n (P) are invertible for all n and P £ P; 
(Hi) ||S n (P)|| 0)r and ||S n (P) _1 || 0)r are uniformly bounded in n and P £ P. 

Assumption 3.1 imposes that the sample be i.i.d. with P belonging to a 

set of distributions P over which our results will hold uniformly. In Assumption 3.2(i) 

k 

we require the functions {qk,n,. ?}fe=l f° be bounded by a constant B n possibly diverging 
to infinity with the sample size. Hence, Assumption 3.2(i) accommodates both trans¬ 
formations that are uniformly bounded in n, such as trigonometric series, and those 
with diverging bound, such as b-splines, wavelets, and orthogonal polynomials (after or¬ 
thonormalization). The bound on eigenvalues imposed in Assumption 3.2(h) guarantees 
that {qk : n.j}n=i are Bessel sequences uniformly in n, while Assumption 3.2(iii) formalizes 
that the sieve 0 n n R be finite dimensional. In turn, Assumption 3.3 controls the “size” 
of the class T n , which is crucial in studying the induced empirical process. We note that 
the entropy integral is allowed to diverge with the sample size and thus accommodates 
non-compact parameter spaces 0 as in Chen and Pouzo (2012). Alternatively, if the 
class F = (Jn=i •Bn I s restricted to be Donsker, then Assumptions 3.3(ii)-(iii) can hold 
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with uniformly bounded J n and Finally, Assumption 3.4 imposes requirements 

on the weighting matrix T, n - namely, that it converge to an invertible matrix T, n (P) 
possibly depending on P. Assumption 3.4 can of course be automatically satisfied under 
nonstochastic weights. 


4 Rate of Convergence 

As a preliminary step towards approximating the finite sample distribution of I n (R ), we 
first aim to characterize the asymptotic behavior of the minimizers of Q n on © n n R. 
Specifically, for any sequence r n | 0 we study the probability limit of the set 

Q n n R= {9 e Q n n R : ^=Q n (6) < inf -^—Q n (9) + r n } , (36) 

yn 9ee n nR yjn 

which constitutes the set of exact (r n = 0) or near (r n > 0) minimizers of Q n . We 
study the general case with T n -l 0 because results for both exact and near minimizers 
are needed in our analysis. In particular, the set of exact and near minimizers will be 
employed to respectively characterize and estimate the distribution of I n {R). 

While it is natural to view @o(P) n R as the candidate probability limit for 0 n n R, 
it is in fact more fruitful to instead consider 0 n n R as consistent for the sets !) 

&0n(P) n R = arg min \\Ep\p(X i} 9) * q^ n (Zi)]\\ r . (37) 

6»e©„ni? 

Heuristically, ©o n(P) n R is the set of minimizers of a population version of Q n where 
the number of moments k n has been fixed and the parameter space has been set to 
0 n n R (instead of 0 n R). As we will show, a suitable rate of convergence towards 
@o n{P) 0 R suffices for establishing size control, and can in fact be obtained under 
weaker requirements than those needed for convergence towards ©o(-P) 0 R. 

Following the literature on set estimation in finite dimensional settings (Chernozhukov 
et ah, 2007; Beresteanu and Molinari, 2008; Kaido and Santos, 2014), we study set con¬ 
sistency under the Hausdorff metric. In particular, for any sets A and B we define 

~cl h(A, B, || ■ ||) = sup inf ||a — b\\ (38) 

a&Ab&B 

d H (A,B , || • ||) = max{ct h (A,B, || • \\),~cIh(B, A, || • ||)} , (39) 

which respectively constitute the directed Hausdorff distance and the Hausdorff distance 
under the metric || • ||. In contrast to finite dimensional problems, however, in the present 

Assumptions 3.3(i) and 3.3(iii) respectively imply T n is closed and totally bounded under || ■ || L 2 
and hence compact. It follows that the minimum in (37) is attained and Qon(P) (~l R is well defined. 
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setting we emphasize the metric under which the Hausdorff distance is computed due 
to its importance in determining a rate of convergence; see also Santos (2011). 


4.1 Consistency 


We establish the consistency of 0 n n R under the following additional assumption: 

Assumption 4.1. (i) supp gPo inf 0 G © n or \\E P \p{X u d)*q^{Zi)]\\ r < Cn for some (n I 0; 
(ii) Let (0o n{P) H R) e = {9 E 0 n n R : ©o n{P) n R, || ■ ||b) < e} and, set 

S n (e)= inf inf \\E P \p{X u 6) *&&)] || r (40) 

PeP 0 0e(e„nR)\(0 O n(P)ni?) e 

for any e > 0. Then {Cn + kl /r ^log (k n )J n B n /y/n} = o(S n (e)) for any e > 0. 
Assumption 4.1 (i) requires that the sieve 0 n n R be such that the infimum 

inf \\E P [p(X l ,d)* q t(Z i )\\ r (41) 

0e0nn R 

converges to zero uniformly over P G Pq. Heuristically, since for any P G Po the infimum 
in (41) over the entire parameter space equals zero (0o (P) n R 7 ^ 0), Assumption 4.1 (i) 
can be interpreted as demanding that the sieve 0 n n R provide a suitable approximation 
to 0 n R. In turn, the parameter S n (e) introduced in Assumption 4.1(h) measures how 
“well separated” the infimum in (41) is (see (40)), while the quantity 

kn \/log ( k'n ) JriPn 
yjn 


represents the rate at which the scaled criterion Q n /y/n converges to its population 
analogue; see Lemma B.2. Thus, Assumption 4.1 (ii) imposes that the rate at which “well 
separatedness” is lost (S n (e) | 0) be slower than the rate at which Q n /y/n converges to 
its population counterpart - a condition originally imposed in estimation problems with 
non compact parameter spaces by Chen and Pouzo (2012) who also discuss sufficient 
conditions for it; see Remark 4.1. 

Given the introduced assumption, Lemma 4.1 establishes the consistency of 0 n n R. 

Lemma 4.1. Let Assumptions 3.1, 3.2(i), 3.3, 3-4, and 4-1 hold, (i) If the sequence 
{r n } satisfies r n = o(S n (e)) for all e > 0, then it follows that uniformly in P E Po 

h{0ti n R, ©On (R) n R, || ■ IIb ) = o p ( l) . (43) 


(ii) Moreover, if in addition {r n } is such that ( ' kn V lo s(f n i JnBn _|_ ^ _ 0 ( Tn ) ! then 


liminf inf P(Qo n (P) HR C 0 n n R) = 1 . 

n—>oo PeP 0 


( 44 ) 
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The first claim of Lemma 4.1 shows that, provided r n 0 sufficiently fast, 0 n n R is 
contained in arbitrary neighborhoods of ©o n(P) H R with probability approaching one. 
This conclusion will be of use when characterizing the distribution of I n (R). In turn, the 
second claim of Lemma 4.1 establishes that, provided r n | 0 slowly enough, ©o n{P) H R 
is contained in © n ni? with probability approaching one. This second conclusion will be 
of use when employing Q n n R to construct an estimator of the distribution of I n (R )• 

Remark 4.1. Under Assumption 3.2(h), it is possible to show there is a C < oo with 


inf 

0£(e n nR)\(e On {P)nR)c 


\\Ep[p(X i ,d)*qt(Z i )]\\ r 


J 

< C x inf {^{EpUEplpJX^eMZi.,]) 2 ]}^} ; (45) 

6e(e n nR)\(e 0 (P)nR)E^i X *’ 71 J/ J ^ 7 


see Lemma C.5. Therefore, if the problem is ill-posed and the sieve 0 n n R grows dense 

in 0n R, result (45) implies that S n (e) = o(l) for all e > 0 as in Chen and Pouzo (2012). 

In contrast, Newey and Powell (2003) address the ill-posed inverse problem by imposing 

compactness of the parameter space. Analogously, in our setting it is possible to show 

k 

that if 0 Cl R is compact and {Qk,n,j}k=\ are suitable dense, then 


lin i inf « /« „«in m W E p[p( X ^ 9 ) * ^ ( Z i)\\r > 0 (46) 

n^oo 0e(0nR)\(0 O n(i 3 )ni?) f 

for any P G Po- Hence, under compactness of 00 R, it is possible for liminf 'S'n(e) > 0, 
in which case Assumption 4.1(h) follows from 4.1 (i) and k ^/log (k n )B n J n = o(y/n). ■ 


4.2 Convergence Rate 

The consistency result in Lemma 4.1 enables us to derive a rate of convergence by 
exploiting the local behavior of the population criterion function in a neighborhood of 
0On(-P) Hfl. We do not study a rate convergence under the original norm || • ||b, however, 
but instead introduce a potentially weaker norm we denote by || • ||e- 

Heuristically, the need to introduce || • ||e arises from the “strength” of || • ||b being 
determined by the requirement that the maps Ye : B — > F and Tq : B — > G be contin¬ 
uous under || ■ ||b; recall Assumption 2.2(h). On the other hand, in approximating the 
distribution of I n (R ) we must also rely on a metric under which the empirical process 
is stochastically equicontinuous - a purpose for which || • ||b is often “too strong” with 
its use leading to overly stringent assumptions. Thus, while || • ||b ensures continuity 
of the maps Ye and Tq, we employ a weaker norm || ■ ||e to guarantee the stochastic 
equicontinuity of the empirical process - here “E” stands for equicontinuity. The follow¬ 
ing assumption formally introduces || • ||e and enables us to obtain a rate of convergence 
under the induced Hausdorff distance. 


16 



Assumption 4.2. (i) For a Banach Space E with norm || • ||e satisfying B n C E for 
all n, there is an e > 0 and sequence {v n }ff =1 with v~ l = 0(1) such that 

©o n(P) n R, || ■ || E ) < {|| E P [p(Xi, 9) * qt(Zi)}\\ r + O(Cn)} 


for allPe P 0 andde {@ 0n (P)nRY = {9 e Q n HR : 0 On (P)ni?, || • || B ) < e}. 


Intuitively, Assumption 4.2 may be interpreted as a generalization of a classical lo¬ 
cal identification condition. In particular, the parameter measures the strength of 
identification, with large/small values of t'” 1 indicating how quickly/slowly the crite¬ 
rion grows as 9 moves away from the set of minimizers ©o n{P) © R. The strength of 
identification, however, may decrease with n for at least two reasons. First, in ill-posed 
problems v~ l decreases with the dimension of the sieve, reflecting that local identifica¬ 
tion is attained in finite dimensional subspaces but not on the entire parameter space. 
Second, the strength of identification is affected by the choice of norm || • || r employed 
in the construction of Q n . While the norms || • || r are equivalent on any fixed finite di¬ 
mensional space, their modulus of continuity can decrease with the number of moments 
which in turn affects v~ 1 ; see Remark 4.2. 

The following Theorem exploits Assumption 4.2 to obtain a rate of convergence. 
Theorem 4.1. Let Assumptions 3.1, 3.2(i), 3.3, 3.f, f.l, and f.2 hold, and let 

Rn = t W" /r ^ l0g !T )J — + <„} ■ (47) 

yn 


(i) If {r n } satisfies r n = o(5 n (e)) for any e > 0, then it follows that uniformly in P & Po 
H(®n © R, ©On(P) © R, || ■ ||e) = Op(R n + v n T n ) . (48) 


(ii) Moreover, if in addition ( kn V lo ^ n PnB n _|_ ^ _ 0 (y n ), then uniformly 


in P € P 


o 


dn(©n © R, ©o n{P) n R, || ■ ||e) = O p (JZ n + u n T n ) . (49) 


Together, Lemma 4.1 and Theorem 4.1 establish the consistency (in || ■ ||b) and rate 
of convergence (in || ■ ||e) of the set estimator 0 n n R. While we exploit these results 
in our forthcoming analysis, it is important to emphasize that in specific applications 
alternative assumptions that are better suited for the particular structure of the model 
may be preferable. In this regard, we note that Assumptions 4.1 and 4.2 are not needed 
in our analysis beyond their role in delivering consistency and a rate of convergence 
result through Lemma 4.1 and Theorem 4.1 respectively. In particular, if an alternative 
rate of convergence 7Z n is derived under different assumptions, then such a result can 
still be combined with our forthcoming analysis to establish the validity of the proposed 
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inferential methods - i.e. our inference results remain valid if Assumptions 4.1 and 4.2 
are instead replaced with a high level condition that 0 n n R be consistent (in || ■ ||b) 
with an appropriate rate of convergence lZ n (in || • ]|e)- 


Remark 4.2. In models in which Oo n (P)nR is a singleton, Assumption 4.2 is analogous 
to a standard local identification condition (Chen et ah, 2014). In particular, suppose 
{bjVjL i is a basis for B n and for each 1 < j < j n and 6 £ (0q n(P) Cl R) e define 


4>) = |r E P [p(X u 6 + rbj) *&&)] 


(50) 


and set Ap tn (0) = [A^ n { 6 ),... ,Ap n J(0)\. Further let a : B„ —> RA be such that 


jn 

b = ^otj{b) x bj , (51) 

3 = 1 

for any b £ B n and a(b) = (a\(b ),..., ctj n (b))'. If the smallest singular value of Ap )n {9) 
is bounded from below by some > 0 uniformly in 0 £ (@o n (P) C R) e and P £ Po, 
then it is straightforward to show Assumption 4.2 holds with u n = i?” 1 x k}/ 2 l ^ r and 
the norm ||6||e = ||ck(6 )||2 _ a norm that is closely related to || • when B C L 2 p . m 


5 Strong Approximation 

In this section, we exploit the rate of convergence derived in Theorem 4.1 to obtain a 
strong approximation to the proposed test statistic I n {R)- We proceed in two steps. 
First, we construct a preliminary local approximation involving the norm of a Gaus¬ 
sian process with an unknown “drift”. Second, we refine the initial approximation by 
linearizing the “drift” while accommodating possibly severely ill-posed problems. 

5.1 Local Approximation 

The first strong approximation to our test statistic relies on the following assumptions: 

Assumption 5.1. (i) sup /Gj r n ||G n ,pfQn n ~^n,pfqn n \\r = Op(a n ) uniformly in P £ P, 
where {a n \ff = \ is some known bounded sequence. 

Assumption 5.2. (i) There exist k p > 0 and I\ p < oo such that for all n, P £ P, and 
all 61,62 £ O n C R we have that Ep[\\p(Xi, 6 \) — p(Xi , ^ 2 )111] < K 2 \\0\ — @2|Ie" p - 

Assumption 5.3. (i) k]/’ ^/log (k n )B n sup PeP J[](Un P ,F n , IML^) = o(a n ); (ii) y/nC, n = 
o(a n ); (in) ||E„ - E n (P)|| 0ir = o p (a n {kn r y/log(k n )B n J n }^ 1 ) uniformly in P £ P. 
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Assumption 5.1 (i) requires that the empirical process G nj p be approximated by an 
isonormal Gaussian process W n> p uniformly in P E P. Intuitively, Assumption 5.1 (i) 
replaces the traditional requirement of convergence in distribution by a strong approxi¬ 
mation, which is required to handle the asymptotically non-Donsker setting that arises 
naturally in our case and other related problems; see Chernozhukov et al. (2013) for 
further discussion. The sequence {a n }^ =1 in Assumption 5.1 (i) denotes a bound on the 
rate of convergence of the coupling to the empirical process, which will in turn character¬ 
ize the rate of convergence of our strong approximation to / n (/?.). We provide sufficient 
conditions for verifying Assumption 5.1 (i) based on Koltchinskii (1994) ’s coupling in 
Corollary G.l in Appendix E. These results could be of independent interest. Alterna¬ 
tively, Assumption 5.1 (i) can be verified by employing methods based on Rio (1994) ’s 
coupling or Yurinskii (1977)’s couplings; see e.g., Chernozhukov et al. (2013). Assump¬ 
tion 5.2(i) is a Holder continuity condition on the map p(-,Xi ) : 0 n n R —> { L 2 P 
with respect to the norm || ■ )|e, and thus ensures that W n , pf < ln n is equicontinuous with 
respect to the index 9 under || ■ ||e for fixed n. However, this process gradually looses 
its equicontinuity property as n diverges infinity due to the addition of moments and 
increasing complexity of the class T n . Hence, Assumption 5.3(i) demands that the || • ||e- 
rate of convergence (TZ n ) be sufficiently fast to overcome the loss of equicontinuity at a 
rate no slower than a n (as in Assumption 5.1(i)). Finally, Assumption 5.3(h) ensures 
the test statistic is asymptotically properly centered under the null hypothesis, while 
Assumption 5.3(iii) controls the rate of convergence of the weighting matrix. 

Together, the results of Section 4 and Assumptions 5.1, 5.2, and 5.3 enable us to 
obtain a strong approximation to the test statistic I n (R)- To this end, we define 

V n (9,£) = {^= € B n : e+ A € 0 n ni? and ||i| E < £} , (52) 

V n v re v n 

which for any 6 E Q n H R constitutes the collection of local deviations from 9 that remain 
in the constrained sieve 0 n n R. Thus the local parameter space V n (9,£ ) is indexed by 
9 which runs over 0 n n R and parameterized by deviations h/y/n from 0; this follows 
previous uses in Chernozhukov et al. (2007) and Santos (2007). The normalization 
by y/n plays no particular role here, since £ can grow and merely visually emphasizes 
localization. By Theorem 4.1, it then follows that in studying I n {R) we need not consider 
the inhmum over the entire sieve (see (26)) but may instead examine the infimum over 
local deviations to 0on(T > ) HR- i.e. the infimum over parameters 

(00,4=) e (©On (P)nR,v n ( 0 o ,e n )) (53) 

y/n 

with the neighborhood V n (9o, £ n ) shrinking at an appropriate rate (7 Z n = o(£ n )). In 
turn, Assumptions 5.1, 5.2, and 5.3 control the relevant stochastic processes over the 
localized space (53) and allow us to characterize the distribution of I n {R )• 
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The following Lemma formalizes the preceding discussion. 


Lemma 5.1. Let Assumptions 3.1, 3.2(i), 3.3, 3-4, 4-l> 5.1, 5.2, and 5.3 hold. It 

then follows that for any sequence {£ n } satisfying IZ n = o(£ n ) and \zj r \og{k n )B n x 
supp g p J[](£n P , E n , II ' || l 2 p ) = °( a n) , we have uniformly in P & Po that 

I n {R)= inf inf ||W n , P p(-, 9 0 )*q^ n +y/fiPp(-, 0 0 +-^=)*q^ ! || Sn (P),r+o p (a n ) 

OoeQonWnR Jfcev n (0oM V n 


Lemma 5.1 establishes our first strong approximation and further characterizes the 
rate of convergence to be no slower than a n (as in Assumption 5.1). Thus, for a con¬ 
sistent coupling we only require that Assumptions 5.1 and 5.3 hold with {a n }^T 1 a 
bounded sequence. In certain applications, however, successful estimation of critical 
values will additionally require us to impose that a n be logarithmic or double logarith¬ 
mic; see Section 6.3. We further note that for the conclusion of Lemma 5.1 to hold, 
the neighborhoods V n (9,£ n ) must shrink at a rate £ n satisfying two conditions. First, 
£ n must decrease to zero slowly enough to ensure the inhmum over the entire sieve is 
indeed equivalent to the infimum over the localized space (lZ n = o(£ n )). Second, £ n 
must decrease to zero sufficiently fast to overcome the gradual loss of equicontinuity 
of the isonormal process W n>P - notice W n , P is evaluated at p(-,9o) * q^' i n place of 
p(-,9o + h/y/n) * q!H n . The existence of a sequence I n satisfying these requirements is 
guaranteed by Assumption 5.3(i). However, as we next discuss, the approximation in 
Lemma 5.1 must be further refined before it can be exploited for inference. 

5.2 Drift Linearization 

A challenge arising from Lemma 5.1, is the need for a tractable expression for the term 

yfcE P \p(Xi, 9 0 + -^) * qfriZi)} , (54) 

V n 

which we refer to as the local “drift” of the isonormal process. Typically, the drift is 
approximated by a linear function of the local parameter h by requiring an appropriate 
form of differentiability of the moment functions. In this section, we build on this 
approach by requiring differentiability of the maps rn p, 3 : ©Hi?—)- L 2 p defined by 

m Pj {9){Z ij ) = Ep\p 3 (Xi,9)\Zij] . (55) 

In the same manner that a norm || • ||e was needed to ensure stochastic equicontinuity 
of the empirical process, we now introduce a final norm || • ||l to deliver differentiability 
of the maps - here “L” stands for linearization. Thus, || • ||b is employed to ensure 
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smoothness of the maps T p and Tq. || • ||e guarantees the stochastic equicontinuity of 
G n< p, and || • ||l delivers the smoothness of the maps rn . Formally, we impose: 

Assumption 5.4. For a Banach space L with norm || • ||l and B n C L for all n, there 
are K m < oo, M m < oo, e > 0 such that for all 1 < j < J, n E N, P E Po, and 
6 1 E (©o n{P) © R) e , there is a linear Xrripj(9 \) : B — > L? p satisfying for all h E B n : 
(i) \\m Pj (6i + h) -m P j{6i) - Vm PtJ (6i)[h] \\ L 2 p < A' m ||h|| L ||/i||E; (n) \\Vmp tJ (Qi)[h] - 
Vmp J ( 6 ' 0 )[/t]|| L 2 ) < K m || 6 >i - 6 > 0 11l11A11 E ; and (Hi) ||Vmp jJ ( 0 o )| 7 i]|| i 2 j < M m \\h\\ E . 

Heuristically, Assumption 5.4(i) simply demands that the functions mp i3 : 0 n R —> 
Lp be locally well approximated under || ■ \\ L 2 p by linear maps Vrrtpj : B —» L 2 P . More¬ 
over, the approximation error is required to be controlled by the product of the || • ||e 
and || • || L norms; see Remark 5.1 for a leading example. We emphasize, however, that 
Assumption 5.4(i) does not require the generalized residuals Pj(Xi, •) : 0ni2 —>• R them¬ 
selves to be differentiable, and thus accommodates models such as the nonparametric 
quantile IV regression of Chernozhukov and Hansen (2005). In addition, we note that 
whenever pj(Xi,9) is linear in 9, such as in the nonparametric IV regression of Newey 
and Powell (2003), Assumption 5.4(i) is automatically satisfied with K m = 0. Finally, 
Assumptions 5.4(h) and 5.4(iii) respectively require the derivatives Xmp t 3 {9) : B —y L 2 P 
to be Lipschitz continuous in 9 with respect to || • ||l and norm bounded uniformly on 
9 E ©o n(P) H R and P E Po. The latter two assumptions are not required for the 
purposes of refining the strong approximation of Lemma 5.1, but will be needed for the 
study of our inferential procedure in Section 6 . 

Given Assumption 5.4, we next aim to approximate the local drift in Lemma 5.1 
(see (54)) by a linear map O n ,p($o) : B n —> R ^' 1 pointwise defined by 

B n<P (9o)[h] = Ep[Xm P (9 0 )[h\(Z i ) * fc(Zi)] . (56) 

where Xm P (9 0 )[h](Zi) = (Vm P ^(9o)[h\(Z it i),... , X?np t j(9o)[h}(Z it j)y. Regrettably, it 
is well understood that, particularly in severely ill-posed problems, the rate of conver¬ 
gence may be too slow for D n; p( 0 o) to approximate the drift uniformly over the local 
parameter space in nonlinear models (Chen and Pouzo, 2009; Chen and Reiss, 2011). 
However, while such a complication can present important challenges when employing 
the asymptotic distribution of estimators for inference, severely ill-posed problems can 
still be accommodated in our setting. Specifically, instead of considering the entire lo¬ 
cal parameter space, as in Lemma 5.1, we may restrict attention to an infimum over 
the subset of the local parameter space for which an approximation of the drift by 
ID ) ra,p(0o) is indeed warranted. The resulting bound for I n {R) is potentially conservative 
in nonlinear models when the rate of convergence lZ n is not sufficiently fast, but remains 
asymptotically equivalent to I n {R) in the remaining settings. 
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Figure 1: Local Drift Linearization 



Our next theorem characterizes the properties of the described strong approximation. 
It is helpful to note here that the notation S n (A i, A 2 ) is defined in Section 3.2.1 as the 
modulus of continuity (on B n ) between the norms on two spaces Ai and A 2 (see (35)). 

Theorem 5.1. Let Assumptions 3.1, 3.2, 3.3, 3-4, 4-1, 4-%> 5.1, 5.2, 5.3, and 5-4(i) 
hold, (i) Then, for any sequence {£ n } satisfying K m £^ x «S n (L, E) = o(a n n~ 2 ) and 
kn r v / log Jkn)B n x sup PgP J[](£n P ,J 7 n, II • \\l 2 p ) = °( a n) it follows that 

In,p{R) < inf inf ||W n ,pp(-, Oq) * q p n + O n ,p{0o)[h\ ||s„(P),r + o p (a n ) , 

e 0 ee 0n (p)nR K eVn {e 0 /„) 

uniformly in P G Po- (ii) Moreover, if in addition K m lZf x 5 n (L, E) = o(a n n~ 2 ), then 
the sequence {£ n } may be chosen so that uniformly in P a Po 

In,p(R) = inf inf ||W n ,pp(-, 9 0 ) * q* n + ^> n ,p{Oo)[h] ||E„(P),r + o v (a n ) . 

6 oeeon(P)nR jL=<zv n (6 0 ,e n ) 

The conclusion of Theorem 5.1 can be readily understood through Figure 1, which 
illustrates the special case in which J = 1, k n = 2, B n = R, and T4(#0:+°o) = R. In 
this context, the Gaussian process W n ,pp('i @o)*Qn n is simply a bivariate normal random 
variable in R 2 that we denote by W for conciseness. In turn, the drift is a surface on R 2 
that is approximately linear (equal to D n) p(0o)[/i]) in a neighborhood of zero. According 
to Lemma 5.1, I n {R) is then asymptotically equivalent to the distance between W and 
the surface representing the drift. Intuitively, Theorem 5.1 (i) then bounds I n (R ) by 
the distance between W and the restriction of the drift surface to the region where it 
is linear - a bound that may be equal to or strictly larger than I n (R ) as illustrated 
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by the realizations Wi and W 2 respectively. However, if the rate of convergence 7Z n 
is sufficiently fast or p(Xi,9) is linear in 9 ( K m = 0), then Theorem 5.1(h) establishes 
I n (R) is in fact asymptotically equivalent to the derived bound - i.e. the realizations of 
W behave in the manner of Wi and not that of W 2 . 

Remark 5.1. In an important class of models studied by Newey and Powell (2003), 
B C Lp and the generalized residual function p(Xi,9) has the structure 


p(X i ,9) = p(X i ,9(V i )) (57) 

for a known map p : R dx x R — > R. Suppose p(Xi, •) : R —> R is differentiable for all 
Xi with derivative denoted V gp(Xi, •) and satisfying for some L m < 00 


\V e p(Xi,ui) - Vep{Xi,u 2 )\ < L m \ui - u 2 \ ■ (58) 

It is then straightforward to verify that Assumptions 5.4(i)-(ii) hold with K m = L m , 
II ' ||e = || ■ \\l 2 p i and || • ||l = || • ||i“, while Assumption 5.4(iii) is satisfied provided 
Xgp(Xi,9o(Vi)) is bounded uniformly in (X;, V)), 9q £ ©o n(P) © R, and P £ Pq. ■ 


6 Bootstrap Inference 

The results of Section 5 establish a strong approximation to our test statistic and thus 
provide us with a candidate distribution whose quantiles may be employed to conduct 
valid inference. In this section, we develop an estimator for the approximating distribu¬ 
tion derived in Section 5 and study its corresponding critical values. 


6.1 Bootstrap Statistic 


Theorem 5.1 indicates a valid inferential procedure can be constructed by comparing 
the test statistic I n (R ) to the quantiles of the distribution of the random variable 


U n ,p(R) = inf inf ||W ni p / 9(.,0o)*?S n +Bn ) p(0o)[/i]|| En (P) J r- 

Oo£Oo n (P)r\R -—^Y n (0o,£n) 


(59) 


In particular, as a result of Theorem 5.1 (i), we may expect that employing the quantiles 
of U U) p{R) as critical values for I n (R) can control asymptotic size even in severely ill- 
posed nonlinear problems. Moreover, as a result of Theorem 5.1(h), we may further 
expect the asymptotic size of the resulting test to equal its significance level at least in 
linear problems ( K m = 0) or when the rate of convergence (lZ n ) is sufficiently fast. 

In what follows, we construct an estimator of the distribution of U n> p(R) by replacing 
the population parameters in (59) with suitable sample analogues. To this end, we note 
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that by Theorem 4.1 and Assumption 3.4(i), the set ©o n(P) © R and weighting matrix 
S n (P) may be estimated by 0 n n R and S n respectively. Thus, in mimicking (59), we 
only additionally require sample analogues W n for the isonormal process W n> p, O n (0) 
for the derivative H nj p(9), and V n (9,£ n ) for the local parameter space V n (9,£ n ). Given 
such analogues we may then approximate the distribution of TJ n ^p(R) by that of 

U n (R)= mf inf \\W n p(-,9)*qt+B n mh}\\± r ■ (60) 

0G0„n R^£V n (0/ n ) 

In the next two sections, we first propose standard estimators for the isonormal process 
W nt p and derivative D n ,p{9), and subsequently address the more challenging task of 
constructing an appropriate sample analogue for the local parameter space V n (9,£ n ). 

6.1.1 The Basics 

We approximate the law of the isonormal process W U) p by relying on the multiplier 
bootstrap (Ledoux and Talagrand, 1988). Specifically, for an i.i.d. sample {u;j}” =1 with 
u>i following a standard normal distribution and independent of {V)}” =1 we set 

n i n 

w nf = Y, - - E w-)> ( 61 ) 

v i= l j= l 

for any function / G L 2 p . Since are standard normal random variables drawn 

independently of {V)}™ =1 , it follows that conditionally on {Ti}” =1 the law of W n f is also 
Gaussian, has mean zero, and in addition satisfies for any / and g (compare to (33)) 

-% n i 71 i n 

E\w n fW n g\{vi}U] = -E(/^) - - E --Yarn ■ (62) 

tl i i n . 

1=1 J =1 J =1 

Hence, W n can be simply viewed as a Gaussian process whose covariance kernel equals 
the sample analogue of the unknown covariance kernel of W n> p. 

In order to estimate the derivative D n .p(0) we for concreteness adopt a construction 
that is applicable to nondifferentiable generalized residuals p(Xi,-) : 0 n R R" 7 . 
Specifically, we employ a local difference of the empirical process by setting 

1 n h 

B n(9)[h] = -= E(^’ 6 + ~r) - P( X ^ 0 )) * <ln n (Zi) (63) 

i=l 

for any 9 G 0„ n R and h G B n ; see also Hong et al. (2010) for a related study on 
numerical derivatives. We note, however, that while we adopt the estimator in (63) due 
to its general applicability, alternative approaches may be preferable in models where 
the generalized residual p(Xj. 9) is actually differentiable in 9 ; see Remark 6.1. 
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Remark 6.1. In settings in which the generalized residual p{Xi,6 ) is pathwise partially 
differentiable in 6 P-almost surely, we may instead define D n ( 0 ) [h] to be 

n 

B n(0)[h] = *q$r{Z i ) , (64) 

n 

i —1 

where Vep(x,9)[h] = p(x,9 + rh) | T= o- It is worth noting that, when applicable, 

employing (64) in place of (63) is preferable because the former is linear in h, and thus 
the resulting bootstrap statistic U n (R) (as in (60)) is simpler to compute. ■ 

6.1.2 The Local Parameter Space 

The remaining component we require to obtain a bootstrap approximation is a suitable 
sample analogue for the local parameter space. We next develop such a sample analogue, 
which may be of independent interest as it is more broadly applicable to hypothesis test¬ 
ing problems concerning general equality and inequality restrictions in settings beyond 
the conditional moment restriction model; see Appendix E for the relevant results. 

6.1.2.1 Related Assumptions 

The construction of an approximation to the local parameter space first requires us to 
impose additional conditions on the sieve 0 n n R and the restriction maps T p and Y G . 

Assumption 6.1. (i) For some K & < oo, ||/j||e < A&||/i||b for all n, h £ B n ; (ii) For 
some e > 0 , Up e p 0 {# e : ~$h({9}, ©o n(P) H R, || • ||b) < e} Q 0 n for all n. 

Assumption 6.2. There exist K g < oo, M g < oo, and e > 0 such that for all n, P £ Po, 
#o G ©o n{P) FR, and 61,62 G {6 6 B„ : H {{9}, ©on(-P) 0 R, || • || B ) < e}: (i) There is 

a linear map VYg(6*i) : B — > G satisfying ||Yg(#i) — Tg{ 62 ) — VYg($i)[0i — @ 2 ]||G < 

^ll^i - O 2 WI; W I|VT g ( 0 i) - VT g (0 o )||o < K g \\9i ~ (9 0 || B : (in) ||VT G ( 0 1 )|| o < M g . 

Assumption 6.1 (i) imposes that the norm || • |b be weakly stronger than the norm 
|| ■ ||e uniformly on the sieve Q n n R. 1 We note that even though the local parameter 
space V n (9,£) is determined by the || • ||e norm (see (52)), Assumption 6.1 (i) implies 
restricting the norm || • |j b instead can deliver a subset of V n (9,£). In turn, Assumption 

6.1 (ii) demands that 0o n(P) F\R be contained in the interior of 0 n uniformly in P £ P. 
We emphasize, however, that such a requirement does not rule out binding parameter 
space restrictions. Instead, Assumption 6.1 (ii) simply requires that all such restrictions 
be explicitly stated through the set R; see Remarks 6.2 and 6.3. Finally, Assumption 

ll) Since B„ is finite dimensional, there always exists a constant K n such that ||/i||e < A„||/i||b for all 
h £ B n . Thus, the main content of Assumption of 6.1 (i) is that Kb does not depend on n. 
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6.2 imposes that T q : B —>• G be Frechet differentiable in a neighborhood of &o n (P)nR 
with locally Lipschitz continuous and norm bounded derivative Y7Yq{9) : B — > G. 

In order to introduce analogous requirements for the map Yp : B —> F we first define 

F n = span{ |J Y F (<9)} , (65) 

0e B n 

where recall for any set C , span{C} denotes the closure of the linear span of C - i.e. F n 
denotes the closed linear span of the range of Tp : B„ —> F. In addition, for any linear 
map T : B —> F we denote its null space by AA(r) = {h G B : T(h) = 0}. Given these 
definitions, we next impose the following requirements on Tp and its relation to Yq: 

Assumption 6.3. There exist Kf < oo, Mf < oo, ande > 0 such that for alln, P 6 Po, 
#o G ©o n{P) PR, and 6>i,(9 2 G {0 G B n : ~cl H ({9}, @ 0 n{P) Hi?, || • || b) < e}- (%) There is 
a linear map VY f (#i) : B —> F satisfying ||Y F (6h) — Y F (0 2 ) — VY f (#i)[0i — 0 2 ]||p < 
K f \\0i - 0 2 ||b/ (*) ||VY jP (6» 1 ) - VY f (6» 0 )|| o < - 6> 0 ||b/ (in) ||VY F (0 1 )|| o < M f ; 

(iv) VY f(9i) : B n —> F n admits a right inverse VY f(0i)~ with Kf || V Y f(0\)~ || 0 < Mf. 

Assumption 6.4. Either (i) Yp : B — >■ F is linear, or (ii) There are constants e > 0, 
Kd < oo such that for every P G Po, n, and G ©on(P) © R there exists a ho G 
B n n Af(VY f(9o)) satisfying Y G (d 0 ) + VY G (0 o )[/io] < -el G and ||/i 0 ||b < Kd- 

Assumptions 6.3 and 6.4 mark an important difference between hypotheses in which 
Y F is linear and those in which Y F is nonlinear - in fact, in the former case Assumptions 

6.3 and 6.4 are always satisfied. This distinction reflects that when Yp is linear its impact 
on the local parameter space is known and hence need not be estimated. In contrast, 
when Yp is nonlinear its role in determining the local parameter space depends on 
the point of evaluation 9q e ©on(P) © R and is as a result unknown. 11 In particular, 
we note that while Assumptions 6.3(i)-(iii) impose smoothness conditions analogous 
to those required of Yq, Assumption 6.3(iv) additionally demands that the derivative 
X7Yp(9) : B n — > F n posses a norm bounded right inverse for all 0 in a neighborhood of 
@o n{P)PR- Existence of a right inverse is equivalent to the surjectivity of the derivative 
VY p(6) : B n -> F„ and hence amounts to the classical rank condition (Newey and 
McFadden, 1994). In turn, the requirement that the right inverse’s operator norm be 
uniformly bounded is imposed for simplicity. 1 Finally, Assumption 6.4(h) specifies the 
relation between Yp and Y q when the former is nonlinear. Heuristically, Assumption 
6.4(h) requires the existence of a local perturbation to 9q e ©on(P) © R that relaxes the 
“active” inequality constraints without a first order effect on the equality restriction. 

11 For linear Tjr, the requirement T f(9 + h/y/n) = 0 is equivalent to Tp(ft) = 0 for any 9 € R. In 
contrast, if Tf is nonlinear, then the set of h G B n for which T f( 9 + h/y/n) = 0 can depend on 9 G R. 

12 Recall for a linear map F : B„ —> F n , its right inverse is a map : F n —» B„ such that VT~(h) = h 
for any h £ B„. The right inverse F - need not be unique if T is not bijective, in which case Assumption 
6.3(iv) is satisfied as long as it holds for some right inverse of VT f{9) : B n —> F n . 
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Remark 6.2. Certain parameter space restrictions can be incorporated through the 
map To : B —» G. Newey and Powell (2003), for example, address estimation in ill- 
posed inverse problems by requiring the parameter space 0 to be compact. In our present 
context, and assuming A/ G R for notational simplicity, their smoothness requirements 
correspond to setting B to be the Hilbert space with inner product 



for some integer J > 0 and 5 > 1/2, and letting © = {6 G B : ||#||b < B}. It is then 
straightforward to incorporate this restriction through the map Tq ■ B —> R by letting 
G = R and defining Tq(6) = ||0||g — B 2 . Moreover, given these definitions, Assumption 

6.2 is satisfied with VY G (0)[/i] = 2(0, /i)b, K g = 2, and M g = B. m 

Remark 6.3. The consistency result of Lemma 4.1 and Assumption 6.1 (ii) together 
imply that the minimum of Q n (0 ) over 0 n n R is attained on the interior of 0 n n R 
relative to B n n R. Therefore, if the restriction set R is convex and Q n (0) is convex in 
0 and well defined on B n n R (rather than 0 n n R), then it follows that 

1 n 

I n(R)= inf \\- r ^p(X i ,0)*q k n n (Z i )\\ t + o p (a n ) (67) 

0Ex3nn/t yj 71 . 

1=1 

uniformly in P G Po he. the constraint 0 G 0 n can be omitted in computing I n (R). ■ 

6.1.2.2 Construction and Intuition 

Given the introduced assumptions, we next construct a sample analogue for the local 
parameter space V n (0o,£ n ) of an element 9q G ©o n (P) © R- To this end, we note that 
by Assumption 6.1(h) V n (0o,£ n ) is asymptotically determined solely by the equality 
and inequality constraints. Thus, the construction of a suitable sample analogue for 
V n (0o, t a ) intuitively only requires estimating the impact on the local parameter space 
that is induced by the maps T p and - a goal we accomplish by examining the impact 
such constraints have on the local parameter space of a corresponding 6 n G 0 n n R. 

In order to account for the role inequality constraints play in determining the local 
parameter space, we conservatively estimate “binding” sets in analogy to what is done 
in the partially identified literature. 13 Specifically, for a sequence {r n }^ =1 we define 

G n (0) = {—]= £ Bn : r G (0 + A) < (T g (0) - K g r n ||-^|| b 1g) V (-r-„l G )} , (68) 

yn yn \Jn 

13 See Chernozhukov et al. (2007), Galichon and Henry (2009), Linton et al. (2010), and Andrews and 
Soares (2010). 
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Figure 2: Approximating Impact of Inequality Constraints 




where recall 1 g is the order unit in the AM space G, g± V 52 represents the (lattice) 
supremum of any two elements 51,(72 £ G, and K g is as in Assumption 6.2. Figure 2 
illustrates the construction in the case in which Xi E R, B is the set of continuous func¬ 
tions of Xi, and we aim to test whether 9o(x) < 0 for all x E R. In this setting, assuming 
no equality constraints for simplicity, the local parameter space for 9q corresponds to 
the set of perturbations h/y/n such that 6q + h/y/n remains negative — i.e. any function 
h/y/n E B n in the shaded region of the left panel of Figure 2. 11 For an estimator 9 n of 
0o, the set G n (9 n ) in turn consists of perturbations h/y/n to 9 n such that 6 n + h/y/n 
is not “too close” to the zero function to accommodate the estimation uncertainty in 
9 n - i.e. any function h/y/n E B n in the shaded region of the right panel of Figure 
2. Intuitively, as 9 n converges to $o the set G n (9 n ) is thus asymptotically contained in, 
i.e. smaller than, the local parameter space of 9$ which delivers size control. Unlike 
Figure 2, however, in settings for which T q is nonlinear we must further account for the 
curvature of Tq which motivates the presence of the term K g r n \\h/ \/^||b1g hi (68). 

While employing G n (9) allows us to address the role inequality constraints play on 
the local parameter space of a 9q E @0 n(P) H R, we account for equality constraints by 
examining their impact on the local parameter space of a corresponding 9 n E 0 n H R. 
Specifically, for a researcher chosen 4 I 0 we define V n (9,£ n ) (as utilized in (60)) by 

V n (9Jn) = {^= € B n : A G G n (6), T F (9 + ^~) = 0 and ||^|| B < 4} • (69) 

yjn y/n y/n yjn 

Thus, in contrast to V n {9 ,£ n ) (as in (52)), the set V n (9,£ n ): (i) Replaces the requirement 
T g{9 + h/y/n) < 0 by h/y/n E G n {9), (ii) Retains the constraint Tp(9 + h/y/n) = 0, 

14 Mathematically, B = G, Tq is the identity map, K g = 0 since Tq is linear, the order unit 1g is 
the function with constant value 1, and 8 1 V 82 is the pointwise (in x ) maximum of the functions 81 , 82 - 
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Figure 3: Approximating Impact of Equality Constraints 




and (iii) Substitutes the || • ||e norm constraint by ||/i/\/™IIb < in- Figure 3 illustrates 
how (ii) and (iii) allow us to account for the impact of equality constraints in the special 
case of no inequality constraints, B = R 2 , and F = R. In this instance, the constraint 
Tp(9) = 0 corresponds to a curve on R 2 (left panel), and similarly so does the local 
parameter space V n {9 , +oo) for any 9 e R 2 (right panel). Since all curves V n (9, +oo) pass 
through zero, we note that all local parameter spaces are “similar” in a neighborhood 
of the origin. However, for nonlinear T p the size of the neighborhood of the origin in 
which V n (0 n ,+oo ) is “close” to V n {0o, +oo) crucially depends on both the distance of 
9 n to 9o and the curvature of Tp (compare I4(0i,+oo) and V n (92, +oo) in Figure 3). 
Heuristically, the set V n (9 n ,£ n ) thus estimates the role equality constraints play on the 
local parameter space of 9q by restricting attention to the expanding neighborhood of 
the origin in which the local parameter space of 9 n resembles that of 9q. In this regard, 
it is crucial that the neighborhood be defined with respect to the norm under which T p 
is smooth (|| • ||b) rather than the potentially weaker norms || • ||e or || • 11 l. 

Remark 6.4. In instances where the constraints Tp : B —> F and Tq : B —> G are 
both linear, controlling the norm || • ||b is no longer necessary as the “curvature” of T p 
and T q is known. As a result, it is possible to instead set V n (9 ,£ n ) to equal 

Vn(9,£ n ) = {—j= £ Bn : 4= e G„(0), T p(9 + -^) = 0 and ||^|| E < 4} ; (70) 

yn yn yn yn 

i.e. to weaken the constraint ||It/y / n||B < in in (69) to H/i/v^IIe < in- Controlling 
the norm || • || E , however, may still be necessary in order to ensure that D n (0)[/i] is a 
consistent estimator for D n ,p(9)[h] uniformly in h/y/n G V n (9,i„). ■ 
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6.2 Bootstrap Approximation 


Having introduced the substitutes for the isonormal process W n> p, the derivative H> n) p(0), 
and the local parameter space V n (6,£ n ), we next study the bootstrap statistic U n (R) 
(as in (60)). To this end, we impose the following additional Assumptions: 

Assumption 6.5. (i) sup f e:Fn ||W nfQn 71 ~ W* P fqn n \\ r = o p (a n ) uniformly in P £ P 
/or W* P an isonormal Gaussian process that is independent of {V t }f =l . 

Assumption 6.6. (i) For any e > 0, r n = o(S n (e)); (ii) The sequences £ n ,T n satisfy 
kn \/log ( k n ) / >! i X SUp pgp T [ ] (£n V ( VnTn ) p i Pm ||' II) — o(fl n ), K m £ n (£ n ~\~ 'RnP^nPn) X 
^n(E, E) — o(a n n 2 ) ? and £n(£n T \fPn T < } X *^n(B, E))l{ATj !> 0} — o(a n n 2 ), 
(in) The sequence r n satisfies limsup^^.^ 1 {K g > 0}l n /r n <1/2 and (7 Z n + v n T n ) x 
<5 n (B, E) = o(r n ); (iv) Either K f = K g = 0 or (7^ n + v n r n ) x 5 n (B, E) = o(l). 

Assumption 6.5 demands that the multiplier bootstrap process W„, be coupled with 
an isonormal process W* p that is independent of the data {V)}/ =1 . Intuitively, this 
condition requires that the multiplier bootstrap, which is automatically consistent for 
Donsker classes, still be valid in the present non-Donsker setting. Moreover, in accord 
with our requirements on the empirical process, Assumption 6.5 demands a coupling rate 
faster than a n (see Assumption 5.1). We provide sufficient conditions for Assumption 
6.5 in Appendix H that may be of independent interest; see Theorem H.l. In turn, 
Assumption 6.6 collects the necessary bandwidth rate requirements, which we discuss 
in more detail in Section 6.2.2. Assumption 6.6(i) in particular demands that r n f 0 
sufficiently fast to guarantee the one sided Hausdorff convergence of 0 n n R. We note this 
condition is satisfied by setting r n = 0, which we recommend unless partial identification 
is of particular concern. Similarly, Assumption 6.6(h) requires £ n f 0 sufficiently fast 
to ensure that O n (6)[h] is uniformly consistent over 6 G 0 n n R and h/y/n G V n (9,£ n ), 
and that both the intuitions behind Figures 1 and 3 are indeed valid. The latter two 
requirements on £ n are respectively automatically satisfied by linear models (K m = 0) 
or linear restrictions (Kf = 0). Assumption 6.6(iii) specifies the requirements on r n , 
which amount to r n not decreasing to zero faster than the || • |(e-rate of convergence. 
Finally, Assumption 6.6(iv) guarantees the directed Hausdorff consistency of 0 n n R 
under || • ||b in nonlinear problems, thus allowing V n (0 n . £ n ) to properly account for the 
impact of the curvatures of Y> and on the local parameter space; recall Figure 3. 

Given the stated assumptions, the following theorem establishes an unconditional 
coupling of U n (R ) that provides the basis for our subsequent inference results. 

Theorem 6.1. Let Assumptions 2.1(i), 2.2(i), 3.1, 3.2, 3.3, 3.f, f.l, 5.1, 5.2, 5.3(i), 
5.3(iii), 5-4, 6.1, 6.2, 6.3, 6 . 4 , 6.5, and 6.6 hold. Then, uniformly in P £ Pq 


Un(R) > 


inf 


inf 


e£e 0 n(P)nR -±=ev n (o,2K b e n ) 


l,pp(-,0)*q k n n 


[h\ ||E n (P),r “I” Opiflr 
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Theorem 6.1 shows that with unconditional probability tending to one uniformly on 
P E Po our bootstrap statistic is bounded from below by a random variable that is 
independent of the data. The significance of this result lies in that the lower bound 
is equal in distribution to the upper bound for I n fR) derived in Theorem 5.1 (i), and 
moreover that the rate of both couplings are controlled by a n . Thus, Theorems 5.1 (i) 
and 6.1 provide the basis for establishing that comparing I n (R ) to the quantiles of U n (R ) 
conditional on the data provides asymptotic size control - a claim we formalize in Section 
6.3. Before establishing such a result, however, we first examine whether the conclusion 
of Theorem 6.1 can be strengthened to hold with equality rather than inequality - i.e. 
whether an analogue to Theorem 5.1 (ii) is available. Unfortunately, as is well understood 
from the moment inequalities literature, such a uniform coupling is not possible when 
inequality constraints are present. As we next show, however, Theorem 6.1 can be 
strengthened to hold with equality under conditions similar to those of Theorem 5.1(h) 
in the important case of hypotheses concerning only equality restrictions. 

6.2.1 Special Case: No Inequality Constraints 

In this section we focus on the special yet important case in which the hypothesis of 
interest concerns solely equality restrictions. Such a setting encompasses, for example, 
the construction of confidence regions for functionals of the parameter 9q without im¬ 
posing shape restrictions; see e.g. Horowitz (2007), Gagliardini and Scaillet (2012), and 
Chen and Pouzo (2015) among others. Formally, we temporarily assume R equals 

R = {deB :T F (6) = 0} . (71) 

Under this extra structure the formulation of the test and bootstrap statistics remain 
largely unchanged, with the exception that the set V n (9,l n ) simplifies to 

V n {e,i n ) = {“ 7 = £ B n : r F (0 + A) = 0 and ||-^=|| B < 4} (72) 

V n v n \Jn 

(compare to (69)). Since these specifications are a special case of our general frame¬ 
work, Theorem 6.1 continues to apply. 1. In fact, as the following Theorem shows, the 
conclusion of Theorem 6.1 can be strengthened under the additional structure afforded 
by (71) and conditions analogous to those imposed in Theorem 5.1 (ii). 

Theorem 6.2. Let Assumptions 2.1(i), 2.2(i), 3.1, 3.2, 3.3, 3-4, 4-R 5.1, 5.2, 5.3, 5-4, 
6.1, 6.3, 6.5, 6.6(i)-(ii) hold, the set R satisfy (71), and {fR n + v n T n ) x «S n (B, E) = o(4)- 

15 To see (71) and (72) are a special case of (4) and (69) respectively, let G = R, T g{9) = —1 for all 
9 £ B, and then note {9 £ B : T g(9) < 0} = B and G n {9) = B„ for all 9 and r n . 
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(i) If r n satisfies (. k}/ r yf\og(k n )J n B n /y/n + ( n ) = o{r n ), then uniformly in P & Pq 


U n (R) = inf 


inf 


e&e 0 n(P)nR Jf=ev n (0,2K b e n ) 


i,PP('i@) * Qn” + ^n,p{9)[h] ||s ri (P),r T Op(a n ) . 


(ii) IfQ 0n (P)nR = {9 0 n{P)} and Y, n (P ) = { Var P {p(Xi, 9 0n {P))qn n ( z i)}} 2 f or every 
P £ Po and in addition r = 2, then for c n = dim{ B n n AffVT p(6o n (P)))} we have 

U n {R) = {X£ n - Cn } 2 + o p (a n ) , 

uniformly in P £ Pq, where X£ is a d-degrees of freedom chi-squared random variable. 


Besides assuming a lack of inequality constraints, Theorem 6.2 demands that the 
rate of convergence IZ n satisfy 77 n 5 n (B,E) = o(£ n ). In view of Assumption 6.6(h) the 
latter requirement can be understood as imposing that either T p and p(Xi , •) are linear 
in 9 (Kf = K m = 0), or the rate of convergence IZ n is sufficiently fast - conditions that 
may rule out severely ill-posed nonlinear problems as also demanded in Theorem 5.1 (ii). 
Given these requirements, and provided r n f 0 slowly enough to ensure the Hausdorff 
convergence of Q n nR, Theorem 6.2(i) strengthens the conclusion of Theorem 6.1 to hold 
with equality rather than inequality. Moreover, the random variable to which U n (R ) 
is coupled by Theorem 6.2(i) shares the same distribution as the random variable to 
which I n (R) is coupled by Theorem 5.1 (ii). Thus, Theorems 5.1 (ii) and 6.2(i) together 
provide us with the basis for establishing that the asymptotic size of the proposed test 
can equal its significance level. In turn, Theorem 6.2(h) shows that whenever ©o n(P)PR 
is a singleton, r and £ n (P) may be chosen so that the coupled random variable has a 
pivotal distribution - a result that enables the use of analytical critical values. 

Remark 6.5. Under suitable conditions Theorem 6.2(h) can be generalized to show 


Un(R) 


seeL%nn^ pW + ^ 1 


(73) 


uniformly in P E Po for V ni p(0) a vector subspace of R fcn possibly depending on P 
and 9 £ ©o n (P) © R. Theorem 6.2(h) can then be seen to follow from (73) by setting 
@o n(P) ©R = {9o n (P)} and r = 2. However, in general the characterization in (73) is 
not pivotal and thus does not offer an advantage over Theorem 6.2(i). In this regard, 
we note that setting r = 2 is important to ensure pivotality as projections onto linear 
subspaces may not admit linear selectors otherwise (Deutsch, 1982). ■ 


6.2.2 Discussion: Bandwidths 

In constructing our bootstrap approximation we have introduced three bandwidth pa¬ 
rameters: r n , r n , and £ n . While these bandwidths are necessary for a successful boot- 
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strap approximation in the most general setting, there are fortunately a number of 
applications in which not all three bandwidths are required. With the aim of provid¬ 
ing guidance on their selection, we therefore next revisit the role of r n , r n , and £ n and 
discuss instances in which these bandwidths may be ignored in computation. 

The bandwidth r n was first introduced in Section 4 in the construction of the set 
estimator © n n R. Its principal requirement is that it converge to zero sufficiently fast in 
order to guarantee the directed Hausdorff consistency of © n H R. Since directed Hausdorff 
consistency is equivalent to Hausdorff consistency when ©o n(P) H R is a singleton, r n 
should therefore always be set to zero in models that are known to be identified; e.g. in 
Examples 2.1, 2.2, and 2.3. In settings where ©o n{P) © R is not a singleton, however, 
T n must also decrease to zero sufficiently slowly if we additionally desire 0 n n R to be 
Hausdorff consistent for @o n (P)r\R. The latter stronger form of consistency can lead to 
a more powerful test when ©o n(P) Hi? is not a singleton, as illustrated by a comparison 
of Theorems 6.1 and 6.2(i). Nonetheless, even in partially identified settings it may be 
preferable to set T n to zero to simplify implementation - this is the approach implicitly 
pursued by Bugni et al. (2014), for example, in a related problem. 

Allowing for inequality restrictions lead us to introduce the bandwidth r n in the 
construction of the sample analogue to the local parameter space. Specifically, the 
role of r n is to account for the impact of inequality constraints on the local parameter 
space and is thus unnecessary in settings where only equality restrictions are present 

- e.g. in Section 6.2.1. In this regard, the bandwidth r n may be viewed as analogous 
to the inequality selection approach pursued in the moment inequalities literature. In 
particular, its principal requirement is that it decrease to zero sufficiently slowly with 
overly “aggressive” choices of r n potentially causing size distortions. As in the moment 
inequalities literature, however, we may always set r n = +oo which corresponds to the 
“least favorable” local parameter space of an element 0 E © n n R satisfying Tq{0) = 0 

- i.e. all inequalities bind. 

The final bandwidth £ n , which to the best of our knowledge does not have a precedent 
in the literature, plays three distinct roles. First, it ensures that the estimated derivative 
D ra (0)[h] is consistent for O n ,p(Q)[h] uniformly in h/yjn E V n (0,£ n ). Second, £ n restricts 
the local parameter space to the regions where a linear approximation to the drift of 
the Gaussian process is indeed warranted - recall Theorem 5.1 and Figure 1. Third, it 
accounts for the potential nonlinearity of Tp and by limiting the estimated local 
parameter space to areas where it asymptotically resembles the true local parameter 
space - recall Figures 2 and 3. As a result, the requirements on £ n weaken in applications 
where the challenges it is meant to address are not present - for instance, when the 
generalized residual p(A/,-) and/or the constraints Tp and Tq are linear, as can be 
seen by evaluating Assumption 6.6(ii)-(iii) when K mi Kf, or K g are zero. 

In certain applications, it is moreover possible to show the bandwidth £ n is unneces- 
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sary by arguing that the constraint ||h/\/n||B < £ n ( as in (69)) is asymptotically slack. 
The following Lemma, for example, provides sufficient conditions for this occurrence. 

Lemma 6.1. Suppose for some e > 0 it follows that ||/i||e < i / n||®ra,p(0)[^]||r for all 
9 E (@o n {P) H R) e , P E Pq, and h E y/n{ B„ n R — 9}. If in addition 


sup sup 

0e(0on(P)ni?) e h&^n{B n nR-e}-.\\^\\v>tn 


i(0)[h]-B n , P 


— Op{y n ) (74) 


uniformly in P E Po, Assumptions 3.1, 3.2(i), 3.3, 3.f, f.l, 6.5, 6.6(i) hold, and 
<S n (B, E)77.„ = o(£ n ), then it follows that uniformly in P E Po 

U n (R) = inf inf ||W n p(-, 9) * q* n + t$ n (9)[h\ ||g + o p (a n ) . (75) 

eee n nR^ev n (0,+oo) 


Heuristically, Lemma 6.1 establishes the constraint ||/i/-^/n||B < l n is asymptotically 
not binding provided £ n f 0 sufficiently slowly (5 n (B, E)lZ n = o(£ n )). 16 In order for l n 
to simultaneous satisfy such a requirement and Assumption 6.6(ii)-(iii), however, it must 
be that either the rate of convergence TZ n is adequately fast, or that both the generalized 
residual and the equality constraint are linear. Thus, while it may be possible to set 
l n to be infinite in applications such as Examples 2.1-2.4, the bandwidth i n can remain 
necessary in severely ill-posed nonlinear problems; see Appendix F. 


6.3 Critical Values 

The conclusions of Theorem 5.1 and Theorems 6.1 and 6.2 respectively provide us with 
an approximation and an estimator for the distribution of our test statistic. In this 
section, we conclude our main results by formally establishing the properties of a test 
that rejects the null hypothesis whenever I n [R ) is larger than the appropriate quantile 
of our bootstrap approximation. To this end, we therefore define 

Cn,i—a (P ) = inf{;u : P(I n (R) < u) > 1 - a} (76) 

Cn,i-a = inf{?i : P(Un{R) < U |{V}?=i) > 1 - a} ; (77) 

i.e. Cn,i- a denotes the 1 — a quantile of I n (R ), while c ni i_ a denotes the corresponding 
quantile of the bootstrap statistic conditional on the sample. 

We additionally impose the following two final Assumptions: 

Assumption 6.7. There exists a 6 > 0 such that for all e > 0 and all a. E [a — 5, a + h] it 
follows that supp gP() P(c nt i-a{P) — e < I n {R) < c n> i-a(P) + e) < g n (e A 1)+ o(l), where 

1(, Following Remark 6.4, if the constraints TV and Tg are linear, then it suffices that 1Z n = o(£ n ). 
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the concentration parameter g n is smaller than the coupling rate parameter, namely 

Qn < a~ l . 


Assumption 6.8. (i) There exists a 7 2 > 0 and maps tt n ,P,j '■ ©n n R —> such 

that supp g p sup g£ Q rirR {E P [(Ep[p :] (Xi, 9)\Zi :J \ - qfy (Zi^'-Kn^O)) 2 }}^ = 0(kn lz ) for 
all l < j < J; (ii) The eigenvalues of Ep[qn)j J {Zi t3 )qf)fj 3 (Zi t3 )'} are bounded away from 
zero uniformly in 1 < j < J, n G N, and P £ P. 


It is well known that uniform consistent estimation of an approximating distribution 
is not sufficient for establishing asymptotic size control; see, e.g. Romano and Shaikh 
(2012). Intuitively, in order to get good size control, when critical values are estimated 
with noise, the approximate distribution must be suitably continuous at the quantile 
of interest uniformly in P 6 Po- Assumption 6.7 imposes precisely this requirement, 
allowing the modulus of continuity, captured here by the concentration parameter g n , to 
deteriorate with the sample size provided that g n < a" 1 - that is the loss of continuity 
must occurs at a rate slower than the coupling rate o(a n ) of Theorems 5.1, 6.1, and 
6.2. We refer the reader to Chernozhukov et al. (2013, 2014) for further discussion 
and motivation of conditions of this type, called anti-concentration conditions there. 1. 
Note that in some typical cases, the rate of concentration is g n = 1 with r = 2 and 
g n ~ ydog ~kf with r = oo, which means that the condition on the coupling rate o(a n ) 
arising from imposing Assumption 6.7 are mild in these cases and are expected to be mild 
in others. In turn, Assumption 6.8 imposes sufficient conditions for studying the power 
of the proposed test. In particular, Assumption 6.8(i) demands that the transformations 
{qk,n,j}k=i be able to approximate conditional moments given Z l3 and thus be capable 
of detecting violations of the null hypothesis. Finally, Assumption 6.8(h) enables us to 
characterize the set of local distributions against which the test is consistent. 

Theorem 6.3 exploits our previous results and the introduced assumptions to char¬ 
acterize the asymptotic size and power properties of our test. 

Theorem 6.3. Let the conditions imposed in Theorem 5.1(i) and Theorem 6.1 hold. 

(i) If in addition Assumption 6.7 is satisfied, then we can conclude that 

lirnsup sup P{I n (R) > c n ^i- Q ) < a . 

n— »oo Pg P 0 

(ii) If Assumption 6.7 and the conditions of Theorem 6.2(i) hold and IZ n = o(£ n ), then 

lirnsup sup | P(I n {R) > c ni i_ a ) — a\ = 0 . 

rwoo PgPo 

(Hi) Let Pi, n (M) = {P G P : inf fle enfl{Ej=i IIMp.PQ, 0)1^]IL|} > M ln } for 

1 ' Alternatively, Assumption 6.7 can be dispensed by adding a fixed constant y > 0 to the critical 

value, i.e. using c n ,i-a + y as the critical value; this approach is not satisfactory, since y is arbitrary 

and there is no adequate theory for setting this. 
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7 n = \/k n log (k n )B n J n /y/ri + k n 7z . If in addition Assumption 6.8 holds, then 


lim inf liminf inf P(I n (R) > c n i_ Q ) = 1 . 
M too IHOO PePl.n(Af) 


The first claim of Theorem 6.3 exploits Theorems 5.1(i) and 6.1 to show that the pro¬ 
posed test delivers asymptotic size control. In turn, Theorem 6.3(h) leverages Theorems 
5.1(h) and 6.2(h) to conclude that the asymptotic size of the proposed test can equal 
its significance level when no inequality constraints are present and either the model is 
linear or the rate of convergence IZ n is sufficiently fast. Under the latter structure it is 
also possible to obtain the same conclusion employing analytical critical by exploiting 
Theorem 6.2(h); see Remark 6.6. Finally, Theorem 6.3(iii) characterizes local sequences 
P n € P \ Po for which our test has nontrivial local power. 

Remark 6.6. When ©o n (P)r\R is a singleton {6o n (P)} for all P 6 Po, Theorem 6.2(h) 
provides conditions under which the bootstrap statistic is in fact coupled to the square 
root of a chi-squared random variable. For \f_ Q (d) the 1 — a quantile of a chi-squared 
random variable with d degrees of freedom it is then possible to show that 


lim sup sup I P(In{R) > Xi -a(K - Cn)) - a| = 0 (78) 

n—too PgP 0 


where recall c n = dim{B n n jV(VTjr(0on(P)))}- As in Theorem 6.3(h), however, we 
emphasize such a conclusion does not apply to nonlinear problems in which the rate of 
convergence is not sufficiently fast, or to hypotheses involving inequality restrictions. ■ 

Remark 6.7. In the conditional moment inequalities literature, certain test statistics 
have been shown to converge in probability to zero when all inequalities are “slack” 
(Linton et ah, 2010). It is worth noting that an analogous problem, which could po¬ 
tentially conflict with Assumption 6.7, is not automatically present in our setting. In 
particular, we observe that since U n (#o, @n) 1= B n , Lemma 5.1 implies 


i(R) > inf inf 
e&e 0 n(P)nRheB n 


,,PP{-,6o) * Qn 


+ y/nPp(-, 0 q -|—-==) * q k n 
Pn 


i(P),r “h °p( a n) 


(79) 

and that under regularity conditions the right hand side of (79) is non-degenerate when 
dim{B„} < k n \ see also our simulations of a test of nronotonicity in Section 7. ■ 


7 Simulation Evidence 

We examine the finite sample performance of the proposed test through a simulation 
study based on the nonparametric instrumental variable model 


Yi = e 0 (Xi) + e* , 


(80) 
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where 9 o is an unknown function and Ep\ei\Zj\ = 0 for an observable instrument Z,;. In 
order to illustrate the different applications of our framework, we study both a test of 
a shape restriction and a test on a functional of 9q that imposes a shape restriction to 
sharpen inference. Specifically, we examine the performance of a test of whether 9q is 
monotone, and of a test that imposes monotonicity to conduct inference on the value 
of #o at a point. These applications are closely related to Examples 2.1 and 2.2 and we 
refer the reader to their discussion in Appendix F for implementation details. 


7.1 Design 


We consider a design in which random variables (X*, Z*, e,;) £ R 3 follow the distribution 


( X* 

( 

(°\ 

Z* 

~ N 

0 . 

\ e i ) 

\ 

w 1. 


0.5 

1 

0 



(81) 


and (. Xi,Zi ) e R 2 are generated according to Xj = <k(A*) and Z* = ( I>(Z*) for 0 the 
c.d.f. of a standard normal random variable. The dependent variable Yj is in turn 
created according to (80) with the structural function 8q following the specification 


9 0 (x) = a{l-2H ——)} 
G 


(82) 


for different choices of a. For all positive values of a, the function 6q is monotonically 
decreasing and satisfies #o(0.5) = 0. Moreover, we also note 9q(x) ~ 0 for values of a 
close to zero and 6q{x) ~ <^>(0)(1 — 2x) for values of a close to one and (f> the derivative 
of 0. 1 Thus, by varying a in (82) we can examine the performance of our tests under 
different “strengths” of monotonicity. All the reported results are based on five thousand 
replications of samples {(Y t . X % , Z t )}” =1 consisting of five hundred observations each. 

As a sieve we employ b-Splines {pj, n }j=i of order three with continuous derivatives 
and either one or no knots, which results in a dimension j n equal to four or three 
respectively. Since b-Splines of order three have piecewise linear derivatives, monotonic¬ 
ity constraints are simple to implement as we only require to check the value of the 
derivative at j n — 1 points. The instrument transformations {qk,n}k =i are also chosen 
to be b-Splines of order three with continuous derivatives and either three, five, or ten 
knots placed at the population quantiles. These parameter choices correspond to a total 
number of moments k n equal to six, eight, or thirteen. The test statistic J n (i?) is then 
implemented with r = 2, and E n equal to the optimal GMM weighting matrix computed 
with a two stage least squares estimator constrained to satisfy the null hypothesis as 

ls Formally, 9o(x) converges to the 0 and 0(0)(1 — 2x ) as a approaches 0 and oo respectively. We find 
numerically, however, that 6q{x) is very close to 0(0)(1 — 2 x) for all x € [0,1] for a as small as one. 
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a first stage. Under these specifications, calculating I n (R ) simply requires solving two 
quadratic programming problems with linear constraints. 

Obtaining critical values further requires us to compute the quantiles of U n (R ) con¬ 
ditional on the data, which we simulate employing two hundred bootstrap samples in 
each replication. For the bandwidth choices, we set r n to zero which, despite being po¬ 
tentially conservative under partial identification, is both sufficient for size control and 
computationally simpler to implement. In turn, we explore data driven choices for the 
parameters r n and i n . Specifically, setting p J n(x ) = (pi t n(x),... ,Pj, n {x))' and letting 
Z r ~ N( 0, (A n E„A(J _1 ) with A n = ^ J2iP^n{ x i)Qn n ( z i)\ we select r n by solving 

q r = P(\\p^'Z r \\i j00 < r n ) (83) 

for different choices of q r E {0.05,0.95}. Heuristically, r n is thus the (f r h quantile of 
an estimate of the asymptotic distribution of the || ■ ||i i00 norm of the unconstrained 
minimizer of Q n under hxed values for j n and k n . We therefore interpret q r = 0.05 as 
an “aggressive” choice for r n and q r = 0.95 as a “conservative” one. Finally, for Z; a 
k n X jn random matrix drawn from an estimate of the asymptotic distribution of A n 
under fixed j n and k n asymptotics, we select l n by solving 19 

qi = P( sup \\Z t p\\ ± 2 < 1) (84) 

y 8eR>:|| / 8|| 00 <e„ 

for different values of q$ E {0.05,0.95}; see Remark 7.1 for the rationale behinds this 
choice. In concordance to the choice of r n , here qi = 0.05 also corresponds to the 
“aggressive” choice of £ n and qi = 0.95 to the “conservative” one. 

Remark 7.1. In the hypothesis testing problems of this section, the norm constraint 
WPn 1 '/3/y/n\\B < £n in the definition of V n (6,i n ) can be replaced by ||/3 /\/r||2 < £ n \ see 
Remark 6.4 and the discussion of Example 2.2 in Appendix F. The latter constraint, 
however, is in turn implied by ||/3/\/n||oo < £n/y/Jn, which is computationally simpler to 
implement as it is equivalent to 2 j n linear constraints on (5. Moreover, when T p and 
are linear, the sole role of i n is to ensure D n (0)[/t] is uniformly consistent for D U: p(0)[h\ 
- recall Section 6.2.2. In the present context, such convergence is implied by 

i n 

sup II- & (ZiWn (Zi)'P - Ep[<fc(Zi)pfc{Zi)'0\ || fin 2 = °pM > ( 85 ) 

^ jn -\\^t\\oo<in i= 1 

which motivates using (84) to study the sensitivity of our tests to the choice of £ n . ■ 
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Table 1: Monotonicity Test - Empirical Size 


a — 1 


jn 

It 

Qr 


k n = 6 



00 

II 

£ 



k n = 13 


10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.084 

0.044 

0.010 

0.087 

0.042 

0.009 

0.095 

0.046 

0.009 

3 

5% 

95% 

0.060 

0.030 

0.006 

0.068 

0.033 

0.007 

0.079 

0.037 

0.008 

3 

95% 

5% 

0.083 

0.043 

0.010 

0.086 

0.042 

0.009 

0.095 

0.046 

0.009 

3 

95% 

95% 

0.060 

0.030 

0.006 

0.068 

0.033 

0.007 

0.079 

0.037 

0.008 

4 

5% 

5% 

0.048 

0.023 

0.004 

0.055 

0.028 

0.005 

0.070 

0.032 

0.006 

4 

5% 

95% 

0.047 

0.023 

0.004 

0.055 

0.027 

0.005 

0.069 

0.031 

0.006 

4 

95% 

5% 

0.047 

0.023 

0.004 

0.055 

0.027 

0.005 

0.070 

0.032 

0.006 

4 

95% 

95% 

0.047 

0.023 

0.004 

0.055 

0.027 

0.005 

0.069 

0.031 

0.006 








cr = 0.1 









k n = 6 



OO 

II 

£ 



k n = 13 


jn 

It 

1r 

10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.081 

0.041 

0.010 

0.087 

0.041 

0.009 

0.095 

0.043 

0.010 

3 

5% 

95% 

0.075 

0.036 

0.008 

0.081 

0.038 

0.009 

0.090 

0.042 

0.010 

3 

95% 

5% 

0.081 

0.041 

0.010 

0.087 

0.041 

0.009 

0.095 

0.043 

0.010 

3 

95% 

95% 

0.075 

0.036 

0.008 

0.081 

0.038 

0.009 

0.090 

0.042 

0.010 

4 

5% 

5% 

0.068 

0.034 

0.007 

0.076 

0.037 

0.009 

0.086 

0.040 

0.009 

4 

5% 

95% 

0.068 

0.034 

0.007 

0.076 

0.037 

0.009 

0.086 

0.039 

0.009 

4 

95% 

5% 

0.067 

0.033 

0.007 

0.076 

0.037 

0.009 

0.086 

0.040 

0.009 

4 

95% 

95% 

0.067 

0.033 

0.007 

0.075 

0.037 

0.009 

0.086 

0.039 

0.009 








cr = 0.01 









k n = 6 



k n =8 



kn = 13 


jn 

It 

Qr 

10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.102 

0.050 

0.012 

0.102 

0.052 

0.012 

0.109 

0.053 

0.011 

3 

5% 

95% 

0.100 

0.049 

0.012 

0.100 

0.050 

0.012 

0.107 

0.052 

0.011 

3 

95% 

5% 

0.102 

0.050 

0.012 

0.102 

0.052 

0.012 

0.109 

0.053 

0.011 

3 

95% 

95% 

0.100 

0.049 

0.012 

0.100 

0.050 

0.012 

0.107 

0.052 

0.011 

4 

5% 

5% 

0.099 

0.049 

0.011 

0.100 

0.049 

0.013 

0.103 

0.052 

0.011 

4 

5% 

95% 

0.099 

0.049 

0.011 

0.100 

0.049 

0.013 

0.103 

0.052 

0.011 

4 

95% 

5% 

0.099 

0.048 

0.011 

0.100 

0.049 

0.013 

0.103 

0.052 

0.011 

4 

95% 

95% 

0.098 

0.048 

0.011 

0.100 

0.049 

0.013 

0.103 

0.052 

0.011 


7.2 Results 

We begin by first examining the performance of our inferential framework when applied 
to test whether the structural function 6q is monotonically decreasing. Table 1 reports 
the empirical size control of the resulting test under the different parameter choices. 
The test delivers good size control across specifications, though as expected can be 
undersized when the 9 o is “strongly” monotonic (a = 1). The empirical rejection rates 
are insensitive to the value of , which suggests the asymptotics of Lemma 6.1 are 
applicable and the bandwidth £ n is not needed to ensure size control. In contrast, the 
empirical rejection rates are more responsive to the value of q r , though the “aggressive” 
choice of q r = 0.05 is still able to deliver adequate size control even in the least favorable 

19 The £ n solving (84) is simply the reciprocal of the q\ h quantile of sup /3gR j„.ii )g || c;o<1 ||Z^/3||£ 2 , which 
we approximate using a sample of two hundred draws of Z^. 


39 






















Figure 4: Monotonicity Test - Empirical Power 



configuration (a = 0.01). Finally, we note increasing the dimension of the sieve ( j n ) can 
lead the test to be undersized, while increasing the number of moments (k n ) brings the 
empirical size of the test closer to its nominal level. 

In order to study the power of the test that do is monotonically decreasing, we 
consider deviations from the constant zero function (a = 0). Specifically, we examine 
the rejection probabilities of the test when the data is generated according to 

Yi = 5Xi + Ci (86) 

for different positive values of 5. Figure 4 depicts the power function of a 5% nominal 
level test implemented with j n = 3, qi = q r = 0.05, and different number of moments 
k n . For the violation of decreasing monotonicity considered in (86), the test with fewer 
moments appears to be more powerful indicating the first few moments are the ones 
detecting the deviation from nronotonicity. More generally, however, we expect the 
power ranking for the choices of k n to depend on the alternative under consideration. 

Table 2: Level Test - Empirical Size 


a 

jn 


k n = 6 



k n = 8 



k n = 13 


10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

i 

3 

0.106 

0.051 

0.010 

0.105 

0.054 

0.012 

0.107 

0.056 

0.012 

i 

4 

0.072 

0.034 

0.006 

0.074 

0.036 

0.008 

0.078 

0.038 

0.008 

0.1 

3 

0.106 

0.052 

0.010 

0.106 

0.055 

0.013 

0.107 

0.055 

0.011 

0.1 

4 

0.073 

0.034 

0.006 

0.075 

0.035 

0.008 

0.076 

0.038 

0.008 

0.01 

3 

0.106 

0.052 

0.010 

0.105 

0.054 

0.012 

0.107 

0.056 

0.011 

0.01 

4 

0.073 

0.034 

0.006 

0.074 

0.036 

0.008 

0.077 

0.038 

0.008 
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Table 3: Level Test Imposing Monotonicity - Empirical Size 


a — 1 


jn 

It 

Qr 


fcn = 6 



OO 

II 

£ 



kn = 13 


10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.077 

0.037 

0.008 

0.082 

0.041 

0.007 

0.092 

0.043 

0.008 

3 

5% 

95% 

0.053 

0.026 

0.005 

0.061 

0.030 

0.005 

0.075 

0.033 

0.008 

3 

95% 

5% 

0.077 

0.037 

0.008 

0.082 

0.041 

0.007 

0.092 

0.043 

0.008 

3 

95% 

95% 

0.053 

0.026 

0.005 

0.061 

0.030 

0.005 

0.075 

0.033 

0.008 

4 

5% 

5% 

0.055 

0.026 

0.006 

0.063 

0.029 

0.006 

0.073 

0.033 

0.008 

4 

5% 

95% 

0.055 

0.026 

0.006 

0.063 

0.029 

0.006 

0.073 

0.033 

0.008 

4 

95% 

5% 

0.055 

0.026 

0.006 

0.063 

0.029 

0.006 

0.073 

0.033 

0.008 

4 

95% 

95% 

0.055 

0.026 

0.006 

0.063 

0.029 

0.006 

0.073 

0.033 

0.008 








cr = 0.1 









k n = 6 



OO 

II 

£ 



k n = 13 


jn 

It 

1r 

10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.078 

0.038 

0.008 

0.084 

0.042 

0.009 

0.090 

0.044 

0.009 

3 

5% 

95% 

0.072 

0.035 

0.007 

0.079 

0.037 

0.008 

0.085 

0.040 

0.009 

3 

95% 

5% 

0.078 

0.038 

0.008 

0.084 

0.042 

0.008 

0.090 

0.044 

0.009 

3 

95% 

95% 

0.072 

0.034 

0.007 

0.079 

0.037 

0.008 

0.085 

0.040 

0.009 

4 

5% 

5% 

0.068 

0.034 

0.008 

0.075 

0.037 

0.008 

0.084 

0.039 

0.009 

4 

5% 

95% 

0.068 

0.034 

0.008 

0.075 

0.037 

0.008 

0.084 

0.039 

0.009 

4 

95% 

5% 

0.067 

0.033 

0.008 

0.075 

0.037 

0.008 

0.084 

0.039 

0.009 

4 

95% 

95% 

0.067 

0.033 

0.008 

0.075 

0.037 

0.008 

0.084 

0.039 

0.009 








cr = 0.01 









fc„ = 6 



kn = 8 



kn = 13 


jn 

It 

Qr 

10% 

5% 

1% 

10% 

5% 

1% 

10% 

5% 

1% 

3 

5% 

5% 

0.102 

0.053 

0.012 

0.106 

0.054 

0.013 

0.109 

0.054 

0.011 

3 

5% 

95% 

0.100 

0.051 

0.011 

0.104 

0.053 

0.012 

0.107 

0.053 

0.011 

3 

95% 

5% 

0.102 

0.053 

0.012 

0.106 

0.054 

0.013 

0.109 

0.054 

0.011 

3 

95% 

95% 

0.100 

0.051 

0.011 

0.104 

0.053 

0.012 

0.107 

0.053 

0.011 

4 

5% 

5% 

0.101 

0.051 

0.011 

0.103 

0.049 

0.012 

0.106 

0.052 

0.011 

4 

5% 

95% 

0.101 

0.051 

0.011 

0.103 

0.049 

0.012 

0.106 

0.052 

0.011 

4 

95% 

5% 

0.101 

0.051 

0.011 

0.102 

0.049 

0.012 

0.106 

0.052 

0.011 

4 

95% 

95% 

0.101 

0.051 

0.011 

0.102 

0.049 

0.012 

0.106 

0.052 

0.011 


Next, we apply our inferential framework to conduct inference on the value of the 
structural function 6q at the point x = 0.5 - recall that for all values of a in (82), 
$o(0.5) = 0. First, we examine the size control of a test that does not impose mono¬ 
tonicity, so that the set R consists of all functions 6 satisfying 0(0.5) = 0. For such a 
hypothesis, r n is unnecessary and we therefore examine the quality of the chi-squared 
approximation of Theorem 6.2(h) and Remark 6.6. The empirical size of the corre¬ 
sponding test are summarized in Table 2, which shows adequate size control and an 
insensitivity to the “degree” of monotonicity (cr) of the structural function. 

In addition, we also examine the size of a test that conducts inference on the level 
of do at the point x = 0.5 while imposing the monotonicity of do - i.e. the set R 
consists of monotonically decreasing functions satisfying 0(0.5) = 0. Table 3 reports the 
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Figure 5: Level Test - Empirical Power 


a = 1 and k n = 6 


a = 1 and k n = 13 




Shape Restricted — Unrestricted 


Shape Restricted — Unrestricted 


a = 0.01 and k n = 6 


a = 0.01 and k n = 13 




Shape Restricted 


Unrestricted 


Shape Restricted 


Unrestricted 


empirical size of the corresponding test under different parameter values. The results are 
qualitatively similar to those corresponding to the test of monotonicity summarized in 
Table 1. Namely, (i) All parameter choices yield adequate size control; (ii) The test can 
be undersized in the strongly monotonic specifications (a = 1); (iii) Empirical rejection 
rates are insensitive to the bandwidth i n \ and (iv) Both the “conservative” (q r = 0.95) 
and “aggressive” ( q r = 0.05) choices for r n yield good size control. 

Finally, we compare the power of the test that imposes monotonicity with the power 
of the test that does not. To this end, we consider data generated according to 

Yi = a{l — 2<I>(-1-)} + 5 + ei , (87) 

a 
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so that the structural function is still monotonically decreasing but satisfies $o(0-5) = 5 
instead of the tested null hypothesis $o(0.5) = 0. We implement the tests with j n = 3, 
and qi = q r = 0.05 since such specification yields empirical size closest to the nominal 
level of the test (rather than being undersized). The corresponding power curves are 
depicted for different values of instruments (k n ) and degree of monotonicity (cr) in Fig¬ 
ure 5. The power gains of imposing monotonicity are substantial, even when the true 
structural function is “strongly” monotonic (a = 1). This evidence is consistent with 
our earlier claims of our framework being able to capture the strong finite-sample gains 
from imposing monotonicity. At the nearly constant specification (cr = 0.01), the power 
of the test that imposes nronotonicity improves while the power of the test that does 
not remains constant. As a result, the power differences between both tests are further 
accentuated at a = 0.01. 


8 Conclusion 

In this paper, we have developed an inferential framework for testing “equality” and 
“inequality” constraints in models defined by conditional moment restrictions. Notably, 
the obtained results are sufficiently general to enable us to test for shape restrictions or to 
impose them when conducting inference. While our results focus on conditional moment 
restriction models, the insights developed for accounting for nonlinear local parameter 
spaces are more generally applicable to other settings. As such, we believe our theoretical 
analysis will be useful in the study of nonparametric constraints in complementary 
contexts such as likelihood based models. 
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Appendix A - Notation and AM Spaces 


For ease of reference, in this Appendix we collect the notation employed throughout 
the paper, briefly review AM spaces and their basic properties, and provide a description 
of the organization of the remaining Appendices. We begin with Table 4 below, which 
contains the norms and mathematical notation used. In turn, Table 5 presents the 
sequences utilized in the text as well as the location of their introduction. 


a<b 
I ' \\ l % 


II ' II Q 

II • II) 

(/ //(': •• || ' ||) 
WSJ-II) 

Sn(A,B) 


Table 4: List of norms, spaces, and notation. 

a < Mb for some constant M that is universal in the proof. 

For a measure P and function /, \\f\\ g L q = I\f\ q dP. 

For a vector a = ... ,a^)', ||a||£ = Yli= l |oA| r - 

For a k x k matrix A, ||A|| 0)J . = sup|i a ii =1 ||Aa|| r . 

For a set Q and a map / : Q — > R, the norm ||/||g = sup ffg g \f(g)\- 
For sets A, B, dji{A , B, || • ||) = sup agyl infj )& b ||o — &||. 

For sets A, B, d H (A, B, || • ||) = ma x{d H (A, B, || • ||), d H {B, A. || • ||)}. 
The e bracketing numbers for a class Q under || • ||. 

The entropy integral J[](S,G, || • ||) = / 0 5 {1 + log N {] (e, Q, || ■ ||)} 1/2 de. 
The modulus of continuity of norms on normed spaces A and B. 


Table 5: List of sequences. 

a n A bound on the rate of convergence of the coupling results. 

B n A bound on the sup norm of {f/fc.nj}- Introduced in Assumption 3.2(i). 

S n (e) How “well separated” the minimum is. Introduced in Assumption 4.1 (ii). 
J n A bound on the entropy of T n . Introduced in Assumption 3.3(iii). 
k n The number of moments employed. 

lZ n Convergence rate of Q n n R with r n = o(n~ 2 ). Introduced in Theorem 4.1. 
T n A sequence defining Q n n R. Introduced in equation (36). 
v n Controls the strength of identification. Introduced in Assumption 4.2. 

C n A bound on the population minimum. Introduced in Assumption 4.1 (i). 


Since AM spaces are not often employed in econometrics, we next provide a brief 
introduction that highlights the properties we need for our analysis. The definitions 
and results presented here can be found in Chapters 8 and 9 of Aliprantis and Border 
(2006), and we refer the reader to said reference for a more detailed exposition. Before 
proceeding, we first recall the definitions of a partially ordered set and a lattice: 

Definition A.l. A partially ordered set (G, >) is a set G with a partial order relation¬ 
ship > defined on it - i.e. > is a transitive (x > y and y > z implies x > z), reflexive 
(x > x), and antisymmetric (x > y implies the negation of y > x) relation. ■ 
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Definition A.2. A lattice is a partially ordered set (G, >) such that any pair x, y € G 
has a least upper bound (denoted x V y) and a greatest lower bound (denoted x Ay), m 

Whenever G is both a vector space and a lattice, it is possible to define objects that 
depend on both the vector space and lattice operations. In particular, for x G G the 
positive part x + , the negative part x ~, and the absolute value \x\ are defined by 

x + = x V 0 x~ = (— x) VO \x\ = x V (~x) . (A.l) 

In addition, it is natural to demand that the order relation > interact with the algebraic 
operations of the vector space in a manner analogous to that of R - i.e. to expect 

x > y implies x + z > y + z for each zgG (A.2 ) 

x > y implies ax > ay for each 0 < a G R . (A.3) 

A complete normed vector space that shares these familiar properties of R under a given 
order relation > is referred to as a Banach lattice. Formally, we define: 

Definition A.3. A Banach space G with norm || • ||g is a Banach lattice if (i) G is a 
lattice under >, (ii) ||x||g < IMIg when |x| < |y|, (iii) (A.2) and (A.3) hold. ■ 

An AM space, is then simply a Banach lattice in which the norm || • ||g is such 
that the maximum of the norms of two positive elements is equal to the norm of the 
maximums of the two elements - e.g. Lp under pointwise ordering. The norm having 
such property is called the M-norm. 

Definition A.4. A Banach lattice G is called an AM space if for any elements 0 < 
x, y G G it follows that \\x V ?/||g = max{||x||G, IMIg}- ■ 

In certain Banach lattices there may exist an element 1 g > 0 called an order unit 
such that for any igG there exists a 0 < A £ R for which |x| < A1 g - for example, in 
R d the vector (1,..., 1)' is an order unit. The order unit 1 g can be used to define 

Halloo = {inf A > 0 : |x| < A1 g} , (A.4) 

which is easy to see constitutes a norm on the original Banach lattice G. In principle, 
the norm || • need not be related to the original norm || • ||g with which G was 
endowed. Fortunately, however, if G is an AM space, then the original norm || • ||g 
and the norm || • are equivalent in the sense that they generate the same topology 
(Aliprantis and Border, 2006, page 358). Hence, without loss of generality we refer to 
G as an AM space with unit 1g if these conditions are satisfied: (i) G is an AM space, 
(ii) 1 g is an order unit in G, and (iii) The norm of G equals || • ||oo (as in (A. 4)). 
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We conclude Appendix A by outlining the organization of the remaining Appendices. 

Appendix B: Contains the proofs of the results in Section 4 concerning consistency of 
the set estimator (Lemma 4.1) and its rates of convergence (Theorem 4.1). ■ 

Appendix C: Develops the proofs for the results in Section 5, including the preliminary 
local approximation (Lemma 5.1) and the final drift linearization (Theorem 5.1). ■ 

Appendix D: Contains the proofs for all results in Section 6, including the lower bound 
for the bootstrap statistic (Theorem 6.1), conditions under which the lower bound cou¬ 
pling is “sharp” (Theorem 6.2), and the analysis of the test that compares the proposed 
test statistic to the quantiles of the bootstrap distribution (Theorem 6.3). ■ 

Appendix E: Develops the auxiliary results concerning the approximation of the local 
parameter space. These results depend on the characterization of R only, and thus may 
be of independent interest as they are broadly applicable to hypotheses testing problems 
similarly concerned with examining equality and inequality restrictions. ■ 

Appendix F: Provides additional details concerning the implementation of our test 
and the implications of our Assumptions in the context of the motivating examples 
introduced in Section 2.2. ■ 

Appendix G: Derives primitive conditions for verifying the coupling requirements of 
Assumption 5.1. The results employ the Hungarian construction in Koltchinskii (1994) 
and may be of independent interest. ■ 

Appendix H: Provides primitive conditions for the validity of the Gaussian multiplier 
bootstrap as imposed in Assumption 6.5. These results more generally provide suffi¬ 
cient conditions for the Gaussian multiplier bootstrap to be consistent for the law of 
the empirical process over expanding classes J- n , and may be of independent interest. 
The arguments in this Appendix can also be employed to obtain alternative sufficient 
conditions for Assumption 5.1 that complement those in Appendix G. ■ 


Appendix B - Proofs for Section 4 


Proof of Lemma 4.1: First fix e > 0 and notice that by definition of 0 n n R we have 



<P( 


inf 

6»e(0„nR)\(0 On (P)nR) e 



Qn { 0 )+ T n ) (B.l) 
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for all n and all P £ Po. Moreover, setting Qn t p(9) = || y/nEp[p(Xi,9) *q^ L n {Zi)} ||g r , 
it then follows from Lemmas B.2 and B.3, and Markov’s inequality that 


inf 

6»e(0„nR)\(e On (P)ni?) e 



Qn,p{9 ) 


< inf 

0 £(e n nR)\(e O n(P)r\Ry 


i n (ft) \ n V^g(k n )J n B n 

7a Qn{>)+ ° r( - ua - 1 


(B.2) 


uniformly in P G Pq. In addition, by similar arguments we obtain uniformly in P £ Pq 


■ ( 1 ^ ^ ■ f I ( Q \ . (hn y / log(/c n ) J n B n 

mf —}=Qn{9)< inf —=Q n p(9)+O p ( - 7 =- 

eee n rR Jn ee©„n r Jn Jn 


^ 11 y, || a , n /&r/ -\/log(A: n ) J n B n . . &r/ \/log( k n )J n B n 

< ||^n||o,r X Cn + L> p (- -j= -J = U p {Q n H- 7 =-J , (B.3) 


n 


n 


where the second inequality results from Assumption 4.1 (i) and the equality follows from 
Lemma B.3. For conciseness set rj n = (( n + r n + kl/ 1 y/\og(k n )J n B n /\/n). Then note 
that combining results (B.l), (B.2), and (B.3) we can conclude that 


lirnsup sup P("^_ff(@n n R, &on(B) FI R, || • ||b) > c) 
n-> oo PePo 

< lim sup lim sup sup P( inf —=Q n ^p(9) < Mrj n ) . (B.4) 

Mtoo n—>oo pgp 0 0g(0„nP)\(0on(P)nP) e yn 

Next note that for any a £ R fcn we have ||a|| r = ||S“ 1 S n a|| r < 1 ||o,r|M|v r (provided 

S” 1 exists). Thus, by Assumption 4.1 (ii) and Lemma B.3 we obtain for any M < oo 


limsup sup P{ inf —=Q n p{9 ) < Mrj n ) 

n—»oo PgP u 0e(©nnP)\(©on(P)nP) e \Jn 

< limsup sup P(S n (e) < \\B~ 1 \\ 0>r Mr] n ) = 0 (B.5) 

n— s-oo PePo 

which together with (B.4) establishes the first claim of the Lemma. 

In order to establish (44), we employ the definition of 0„nR to obtain for all P £ Po 

P{& 0 n(P) n R C 0 n n R) > P( sup ~^=Q n (9) < r n ) . (B.6) 

ee© 0 „(P)nP V n 

Therefore, setting S n = kl/ r y/log (k n )J n B n /y/n, exploiting Lemmas B.2 and B.3, and 
the definition of 0 q n(P) HR we then obtain uniformly in P £ Pq that 


SUp ~^=Q n (9) < SUp A=Qn,p(9) + O p (6 n ) 

6>e© 0 n(P)nP V n e»e© 0 n(P)nP V n 

<||Sn||o,rX inf \\E P [p(X i ,9)*qt(Z i )]\\ r + O p {6 n ) = O p (( n + 5 n ) . (B.7) 
6»e©„nP 
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Hence, (44) follows from results (B.6), (B.7) and T n /(5 n + ( n ) —> oo. ■ 

Proof of Theorem 4.1: To begin, we first define the event A n = A n \ n A n 2 where 

A n i = {0 n n i? c (0 On (p) n i?) e } 

A n 2 = {S" 1 exists and max{||E“ 1 || 0 , r , ||E n || 0>r } < B} , (B.8) 

where recall (@o n (P)nR) e = {9 G O n (lR : ~^h{{0}, @ 0 n(P) Hi?, || • ||b) < e}- Moreover, 
note that for any e > 0 and B sufficiently large, Lemmas 4.1 and B.3 imply 


limsup sup P(A c n ) = 0 . (B.9) 

n—^00 PePo 

Hence, for r]~ l = v n {h P / r yJ\og{k n )J n B n /y/n + r n + Cn} we obtain for any M that 


limsup sup P(r ? „^(0 n n J R,0 On (P)n J R,|| ■ || E ) > 2 A/ ) 
n->oo PePo 

= limsup sup P{ri n ~l H (& n n R, 0 On (P) n R, || • || E ) > 2 M ; A n ) (B.10) 

n—>oo PePo 

by result (B.9). Next, for each P G Pq, partition (0 q n (P) 0 R) e \ (©o n{P) © R) into 

Sn,j(P) = {9e (00 n(P) n R) e : 2 J_1 < ({0}, 00 n(P) nR, II • He) < 2 J '} . (B.ll) 

Since 0 n n R C (0 On (P) n i?) e with probability tending to one uniformly in P E Po by 
(B.9), it follows from the definition of © n n R and result (B.10) that 


limsup sup P( V J H (e n © R, @on(P) 0 R, || • || E ) > 2 A/ ) 

n->oo PePo 

°° 1 1 

< limsup sup 'V' P( inf —Q n (9) < inf —Q n (0)+T n - A n ) . (B.12) 
n—>00 Pe p 0 ^ 0eS n ,j(P) y/n eee n nR y/n 


In addition, letting Q n ,p(0 ) = || v^n-EpfppQ,0) * r , we obtain from (B.8), 

Lemma B.2, and the definition of ( n in Assumption 4.1 (i) that under the event A n 


inf -^=Q n (6) < inf -^=Q nP (9) + 
0e0„n Ry/n eee„nRy/n 


Z n ,p < B{Z n: p + Cn} ■ (B.13) 


Therefore, exploiting result (B.12) and that (B.13) holds under A n we can conclude 


limsup sup P(rj n ~<t p (Q n n R, Q 0 n{P ) © R, || • || E ) > 2 M ) 
n->oo PePo 

1 

< limsup sup yy p( inf —=Q n (0) < B{Z n , P + Cn} + r n ; A n ) . (B.14) 
n->oo PeP 0 ®eS„,j(P) y/n 
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Further note that for any a G R fc ", ||a|| r = ||S n 1 S n a|| r < ||S n 1 || r ||a ||2 r < R||a||g r 
under the event A n . Therefore, Lemma B.2 implies that under the event A n 

inf —Q n (9) T inf 7 = Qn,p(@) ||B n || or x Z n p 

e£S nij (P) y/n 0eS n ,j(P) y/n 

> B~ l X inf \\Ep[p(X i ,6)*qt(Z i )]\\ r -BxZ n>P . (B.15) 

v£Sn,j (P) 

Moreover, since Cn < ( JlnVn )~ 1 ) definition (B.ll) implies that for j sufficiently large 
Vn 1 X inf ~3h({0}, © 0 n(-P)nR, || • ||e) -O(Cn) > —- 0(Cn) > —— .(B.16) 

0€Sn,j(P) T]nVn 'Hn^n 

Thus, S n j(P) C (© 0 n (P) H R) € , Assumption 4.2, and (B.16) imply for j large that 

inf \\E P [p(X i ,e)* q t(Z i )]\\r> ■ (B.17) 


Hence, we can conclude from results (B.14), (B.15), and (B.17) that we must have 


limsuplimsup sup P(r) n ~ct p(@ n FI R, @o n(P) FI R, || • 
Mtoo n-> oo PePo 


> 2 


My 


~ l 

< limsuplimsup sup > P( — (- ) < 2BZ n p + B/ n + r n 


Mtoo ti—» oo PgPq ypM B vn u n 


< limsuplimsup sup P(j—(-) < 2 BZ n ^p) , (B.18) 


Mtoo n—>-oc PeP 0 Vn^n 


where in the hnal inequality we exploited that the definition of r/ n implies (rj n u n ) 1 > r n 
and (r/nPn) -1 > £ n . Therefore, i? n) p G R+, Lemma B.2, and Markov’s inequality yield 

. 1 2^ _2 i 

limsuplimsup sup > P(— ,(--) < Z n , P ) 

Mtoo n-> oo PgPo^TTm 'In^n 

< limsuplimsup V 2~* X Wg * = Q ? (R19) 

Mtoo 71-s.oo j> M V n 

where in the final result we used r] n u n < y/n/kn' yf\og{k n )J n B n , and that 2 _ - 7 < 

oo. Hence, the first claim of the Theorem follows from (B.18) and (B.19). 
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To establish the second claim, define the event A n 3 = {Qo n (P)P\R C Q n f]R}. Since 
//(©On(-P) n R, O n nil, || ■ He) = 0 whenever A n 3 occurs, we obtain from Lemma 4.1 


limsuplimsup sup P(jj n dH{O n n R, ©o n (P) © R, || • ||e) > 2 M ) 

Mt°o n—>00 PeP 0 

= limsuplimsup sup P{rj n ~cl n(O n n R, ©on(-P) © R, || • ||e) > 2 A/ ) = 0 (B.20) 
Mtoo n— s-oo PePo 


due to result (48). Therefore, the second claim of the Theorem follows from (B.20). ■ 
Lemma B.l. Let Assumption S.2(i) hold, and define the class of functions 


Qn = {f(x)qk,n, 3 (z 3 ) ■ f eP n , 1 <3<J, and l<k< k nj } . (B.21) 

Then, it follows that N^(e,Q n , || • || L 2 j < k n x N^(e/B n , F n , || • WpZ) for all P E P. 

PROOF: Note that by Assumption 3.2(i) we have sup PgP \\qk,n,j\\L^ < Bn for all 1 < 
j <J and 1 < k < k n>3 , and define q^ n ,j( z j) = Qk,n,j{zj)l{q ktn>j (zj) > 0} and qf n Jzfi = 
qk,n,j(zj)l{qk,n,](zj) < 0}. If { [fi,i,p, fi,u,p\}i is a collection of brackets for T n with 

I(fi,u,p - fi,i,p) 2 dP < e 2 (B.22) 

for all i, then it follows that the following collection of brackets covers the class Q n 


{l9k,n,]fh l >P + qk,n,]fi,u,P > qk,n,]fhhP + 


(B.23) 


Moreover, since \q k ,n,j\ = Q kn j ~ q k n j by construction, we also obtain from (B.22) that 


J (^k,n,3^h u > p + Qk,n,]fh l ! p yt,n,jfh l > p %,n,jfh u > p ') ^ 

= J(fi,u,P - fi,i,p?\q k ,nfdP < e 2 B 2 n . (B.24) 

Since there are k n x IVj](e, T n , || • || L 2 j brackets in (B.23), we can conclude from (B.24) 

N[] (e, Gni || ■ \\l 2 p ) Z k n x IVj](-g—, J~ n , || • IIlJ,) ) (B.25) 

for all P E P, which establishes the claim of the Lemma. ■ 

Lemma B.2. Let Q n ,p(9) = || y/nEp[p(Xi,6) * q!f n (Zi)\ ||^ r , and Assumptions 3.1, 
3.2(i), 3.3(ii)-(iii) hold. Then, for each P E P there are random Z n ,p E R+ with 

- 7=1 Qn{Q) - Qn,P ifi) | < ||S n || or X Z nP , (B.26) 

V n 

for all 9 E 0 n © R and in addition supp gP Ep[Z n ,p\ = 0(kh r ^/log (k n ) J n B n /y/n). 
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PROOF: Let Q n = {f{x)q^ n ^{z 3 ) : / G F n , 1 < j < J and 1 < k < k n>] }. Note that 

by Assumption 3.2(i), sup PgP \\qk,n,j\\L™ < B n for all 1 < j < J and 1 < k < k n j. 

Hence, letting F n be the envelope for F n , as in Assumption 3.3(ii), it follows that 
G n (v) = B n F n (v) is an envelope for Q n satisfying sup PgP Ep[G^(Vi)\ < oo. Thus, 

sup E P [ sup \G n , P g\] < sup J[]{\\G n \\ L 2 ,G n , || • \\ L ii ) . (B.27) 

Pe P g&Qn PeP p p 

by Theorem 2.14.2 in van der Vaart and Wellner (1996). Moreover, also notice that 
Lemma B.l, the change of variables u = e/B n and B n > 1 imply 


sup J[](\\G n \\ L 2 g n , 
PeP p 



< sup 
PeP 




T log(/jnAT[] (f/Bn, Fm 



))de 


< (1 + y/log(k n ))B n X sup Jn(\\F n \\ L 2 F n , || ■ ||p2 ) = 0{y/\og{k n )B n J n ) , (B.28) 

PeP p p 


where the final equality follows from Assumption 3.3(iii). Next define Z n)P G R+ by 


k 1/r 

Z n ,p = X sup |G n pg| (B.29) 

V n g&Gn 


and note (B.27) and (B.28) imply sup PgP Ep[Z Uj p] = 0(kn' -^/log {k n )B n J n /y/n) as 
desired. Since we also have that ||G n ,pp(-,0) * qn n \\r < kl/ r x sup yg g n |G ni pg| for all 
6 G 0 n n R by definition of Q n . we can in turn conclude by direct calculation 


—7=\Qn(Q) - Qn,p{Q) I < 
\/n 


■ J n||o,r 


n 


x ||G n,pp{;0)*q k n 


r < 


x Z, 


n,P i 


(B.30) 


which establishes the claim of the Lemma. ■ 

Lemma B.3. If Assumption 3.f holds, then there exists a constant B < oo such that 


lirninf inf P(T, 1 exists and max{||B n || 07 ., IIS n 1 ||or} < B) = 1 . (B.31) 

II >x: PeP 


PROOF: First note that by Assumption 3.4(iii) there exists a B < oo such that 

sup sup max{ 11 £ n (-P) 11 o,r, ||S n (P) -1 ||o,r} < ^ • (B.32) 

n>lPeP 2 

Next, let I n denote the k n x k n identity matrix and for each P G P rewrite B n as 

S n = Z n (P){I n - H n (P)- 1 (H n (P) - S n )} . (B.33) 

By Theorem 2.9 in Kress (1999), the matrix {I n —S n (P) _1 (E n (P)—B n )} is invertible and 
the operator norm of its inverse is bounded by two when E n (P) _1 (E ri ,(P) — E n )} < 1/2. 
Since by Assumption 3.4(ii) and the equality in (B.33) it follows that E n is invertible if 
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and only if {I n — £ n (P) 1 (T, n (P) — X n )} is invertible, we obtain that 

P(S“ 1 exists and \\{I n - S n (P) _1 (S n (P) - S n )}~ 1 || (V . < 2) 

> P(||S n (P)- 1 (Sn - S n (P))|| 0 , r < J) > P(||£ n - £„(P)|| 0 , r < 4 ) , (B.34) 

Z Jd 

where we exploited ||S n (P) _1 (E n - £ n (P ))\\ 0 , r < ||E n (P) -1 || 0>r ||E re - E n (P)||o,r and 
(B.32). Hence, since ||S n (P)||o,r < P/2 for all P G P and n, (B.33) and (B.34) yield 

P(E“ 1 exists and ||E“ 1 || 0 , r < P) > P(||E n - E n (P)|| 0jr < • (B.35) 

Finally, since ||E n || 0)r < P/2 + ||E n — E n (P )\\ 0 ^ by (B.32), result (B.35) implies that 

liminf inf P(E“ 1 exists and max{||E n |L r , IIET 1 IL r } < P) 

II >oc. /’. I' 

> liminf inf P(||E n - E n (P)|| or < rniri{4, /!-}) = 1 , (B.36) 

ri—»oo PeP 2 P 

where the equality, and hence the Lemma, follows from Assumption 3.4(i). ■ 

Lemma B.4. If a € R d , then ||a||f < ||a|| r for any r, r G [2, oo]. 


PROOF: The case r < r trivially follows from ||a||f < ||a|| r for all a G RA For the case 
r > r, let a = (a/ 1 ),..., a^))' and note that by Holder’s inequality we can obtain that 

d d 

Z—1 i=l 

< {^(|a (i) n^}H^ = {^2 I a^ffrd 1 -^ . (B.37) 

i=l i=l i=l 

Thus, the claim of the Lemma for r > r follows from taking the 1/r power in (B.37). ■ 


Appendix C - Proofs for Section 5 


Proof of Lemma 5.1: First note that the existence of the required sequence {7 n } is 
guaranteed by Assumption 5.3(i). Next, for r) n = o(a n ) let 9 n G 0 n fl R satisfy 

Qn{9 n ) < inf Q n (6)+Vn ■ (C.l) 

flee n m 

Applying Theorem 4.1 with r n = r] n /y/n and noting r n = o{klJ r / y/n), then yields that 

00 n (P) n P, II • ||e) = O p (lZ n ) (C. 2 ) 
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uniformly in P E Po- Hence, defining for each P E Po the shrinking neighborhood 
(00 n{P) n RY n = {V E 0 n n R : h(0, 00 n{P) 0 R, || ■ ||e) < £ n }, we obtain 

I n (R) = inf Qn(0) + o p {a n ) (C.3) 

e&{e 0n {P)r\Ry™ 

uniformly in P E Po due to TZ n = o(£ n ), i] n = o(a n ), and results (C.l) and (C.2). 
Defining 

= W^nM; 8) * <t + V^Pp(; 0) * ct lls n , r (C.4) 

we also obtain from Assumption 5.1 and Lemmas B.3 and C.l that uniformly in P E Pq 


| inf Q n (8)~ inf Q° n P (V)\ 

oe(e 0n {P)nRyn ee(e 0n {P)nRyn 

< J\\^n\\o,r X sup ||G nj p/<^ n - W n ,p/^ n || r = o p (a n ) . (C.5) 

Similarly, exploiting Lemmas B.3 and C.l together with Lemma C.2 and £ n satisfying 
kn r \J\og(k n )B n x suppgp J[](£n p ,R n , || • ||p 2 ) = o(a n ) by hypothesis yields 


inf inf 

Vn 


,PP(‘, Vo + -4=) * Qn n + VnPp(-, 9 0 + -J=) * q k 


n llE„,r 


6o£e O n(P)nRJl=£Vn(0o/n) ' V U V 71 

= inf inf ||W n> pp(-, V 0 ) * + \/nPp(-, V 0 + ~^=) * q kn ||g r + o p (a n ) 

eoee 0n (P)nRJ^ e Vn(.8o/„) V n 

(C.6) 

uniformly in P E Po- Thus, the Lemma follows from results (C.3), (C.5), and (C.6) 
together with Lemma C.3. ■ 

Proof of Theorem 5.1: First we note that Assumption 4.1 (i) implies 


sup sup \\s/nPp(-,V 0 ) *qn"\\r < Vn(n ■ 
RePo 0oee o „(P)nR 


(C.7) 


Therefore, Lemma B.4, result (C.7), and the law of iterated expectations yield that for 
all P E Pq, Vq E 0q n(P) H R , and h/y/n E V n (Vo,£ n ) we must have 


II VnPp(; Vo + \)* qt ~ V>nA° o)M\r 


< \\s/n{Pp(-,V 0 + —) *<?£" - Pp(-,V 0 ) * q kn j -B njP (Vo)[h\\\ 2 + Vn(n ■ (C.8) 
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Moreover, Lemma C.5 and the maps mp i? : B n — y Lp satisfying Assumption 5.4(i) imply 
J , 

y y (Vnjmp^o + -^) - mpj(0o)} - Vmp i3 (eo)[/i],ft inj )^ 

j=i fc=i 

J h h 

< y C' 0 ||v / «{"ip,j(6 , o + -7=) - mp,j(Qo) - Vmpj(0 o )[^]}||| 2 

“ v™ p 

J= i 

< y x nx ||-^||l x ||-^=||| (C.9) 

z —' a/p v re 

J=i 

for some constant Co < 00 and all P E Po, #0 £ @0n(P) H P, and h/y/n E 4n(6*o,4i)- 
Therefore, by results (C.8) and (C.9), and the definition of <S n (L, E) in (35) we get 


h 

sup sup sup ||VnPp(-,0 O + - 7 =) * qn l - ©n,p( 6 l o)[/i]|| r 

PGP 0 e 0 ee 0n (P)nR^ eVn{ e 0 / n ) V n 

5: y/fJCoK m x y/n x x 5 n (L, E) + y/nf n = o(o n ) (C.10) 


due to Kml^ x <S n (L, E) = o(a n n 2 ) by hypothesis and \/nCn = o(a n ) by Assumption 
5.3(h). Next, note that since k}J' y/log(k n )B n X SUp PeP J [] (£n P ,J 7 n,\\ ■ | \l 2 ) = °{ a n), 
Assumption 5.3(i) implies there is a sequence £ n satisfying the conditions of Lemma 5.1 
and £ n = o(£ n ). Therefore, applying Lemma 5.1 we obtain that 


4(P) = yf D . inf _ ||W n pp(-, 6o)*q^ n +y/nPp(-,6o+-^=)*q!^ n || s (P) r +o p (a n ) 
e 0 £<s> 0 n(P)nR J^ eVn (e 0 ,i n ) V n 

(C.H) 

Moreover, since i n = o(£ n ) implies V n (9, i n ) C V n {9, k n ) for all 6 E 0 n n R, we have 


inf inf ||W n ,pp(-, 0 O ) * (fcf + VnPp(-, 9 0 + 

9o£&On(P)nR J^£V„(0 O /n) 



1n”lls il (P),r 


< inf inf 

6>oe©On(P)nP jL, e V n (e 0 /n) 

= inf inf 

0oe©o4P)nP ^eV„(0 o A) 


||W n ,pp(-,0 o ) 

||W n , P p(-,0o) 


* Qn 

* n k1 

* Qn 


+ y/nPp(-,9o + —^=) * ^"||s„(P),r 
+ ®n,p(0)[^] ||e n (P),r + °p( a n ) (C.12) 


uniformly in P E Po, with the final equality following from (C.10), Assumption 3.4(iii) 
and Lemma C.l. Thus, the first claim of the Theorem follows from (C.ll) and (C.12), 
while the second follows by noting that if K m lZ^ x <S n (L, E) = o(a n n~ 2 ), then we may 
set £ n to simultaneously satisfy the conditions of Lemma 5.1 and K m l‘f x <S n (L, E) = 
o(a n n~ 2 ), which obviates the need to introduce i n in (C.ll) and (C.12). ■ 

Lemma C.l. If A is a set, A : A —> R fc ; B : A — > R fc , and W is a k X k matrix, then 
| inf ||WA(A)|| r - inf ||WLB(A)|| r | < \\W\\ 0 , r x sup ||A(A) - £(A)|| r . 


54 




PROOF: Fix q > 0, and let A a £ A satisfy ||ITA(A a )|| r < inf\eA ||W.A(A)|| r + q. Then, 


inf \\WB(X)\\ r - inf \\WA(X)\\ r < \\WB(X a )\\ r - \\WA{\ a )\\ r + 77 

AeA AeA 

< \\W{B(X a ) - A(A a )}|| r + q < \\W\\o,r X sup ||A(A) - 5(A)|| r + r? (C.13) 

AeA 

where the second result follows from the triangle inequality, and the final result from 
| Wv 11 r < ||W || 0 r ||u|| r for any v G R fc . In turn, by identical manipulations we also have 

inf \\WA(\)\\ r - inf \\WB(\)\\ r < \\W\\ 0 , r x sup \\A(\) - B(\)\\ r + r, . (C.14) 

AeA AeA AeA 

Thus, since q was arbitrary, the Lemma follows from results (C.13) and (C.14). ■ 

Lemma C.2. Let Assumptions 3.2(i), 3-4, and, 5.2(i) hold. If 5 n 0 is such that 
kn r \/log (k n )B n x sup PgP J[](<5n p , T n , II • || X 2 ) = o(a n ), then uniformly in P G P: 

sup sup |W n ,pp(-, do + ~^=) * q* n - W„,pp(-,(9o) * r = °pM • 

0oe©o4P)nR^= e y„(0 o ,5 71 ) V n 


PROOF: Since \\qk,n,j\\L^ < B n for all 1 < j < J and 1 < k < k nj by Assumption 3.2(i), 
Assumption 5.2(i) yields for any P G P, 6 G © n n R, and h/^Jn G V n (0, 5 n ) that 


E P [\\p{X u 6+-=)-p(X ll 0)\\lql 


nj {Z i )]<KlBl\\- J =\S'‘ ^ 


< K 2 p B 2 J n 




(C.15) 


Next, define the class of functions Q n = { f(x)qk } n,j(z ) for some / G P n , 1 < j < 
J and 1 < k < fe nj }, and note that (C.15) implies that for all 1 < j < J and P G P 


sup sup max |W n) p^(-, 6» 0 + -j=)qk, n ,j ~ ^n,PPj{-, do)Qk,n, 3 

e 0 ee 0 n(P)nR ^=ev n {e 0 M v n 


< sup B |W n ,p£i-W n ,p 52 | . (C.16) 

9l,92eGn-\\gi-92\\ T 2 <KpB n Sn P 
P 


Hence, since || r — 11 11 o t* 11 ^11 t' — 


k 1/r \ 

r rvn 


for any a G R fc 


(C.16) yields 


sup sup 

0 o e©o„(P)nR -^ev„(e 0 ,Sn) 


i,PP(‘> #0 + ~7=) * Qn ~ W„,pp(-, 9 0 ) * (Q 


< IIV || U l / r 
— W^nWo.r^n 


sup 

9i,g2&Qn m .\\gi—g2\\ L 2 <K p B n 5 n 


,,pgi — W„ ; p 52 | • (C.17) 
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Moreover, Corollary 2.2.8 in van der Vaart and Wellner (1996) implies that 


sup Ep[ 


sup 


PeP 9l,92&Gn-\\gi-g2\\r2 <K p B n Sn P 


1,P91 - W n! p(72|] 


< sup Co / 
PeP Jo 


- KpB n S n p 


y/log IVj](e/2, Q n , || • || L 2 )de (C.18) 


r •K p B n 57 


sup / 
PeP Jo 


< sup y/\og(k nJ I 
PeP Jo 


K B ft p 

p 1J n u n 


for some Cq < oo. In turn, Lemma B.l and the change of variables u = e/2 B n yields 
► 

^/logA^[](e/ 2 ,^ n , || • || L 2 p )de 

y/l + logN\\(e/2B ni F n , || ■ \\ L 2 p )de 

< sup 2y/\og{k n )B n I J 1 + log Nn(u,T n , II • || L 2 )du . (C.19) 

PeP J o v p 

However, since JVj](e, JF n , || • \\ L 2 ) is a decreasing function of e, we can also conclude 


rK p 6„ P / 2 


sup 

PeP 



y/l + logA r [](u,J r n , 



< max{ 


K, 
2 ’ 


1} X sup J[](Sn P ,J r n , 
PeP 



(C.20) 


by definition of J^(5,P n , || • || L 2 j. Therefore, the Lemma follows from (C.17), ||£ n || 0 ,r = 
O p ( 1) by Lemma B.3, and Markov’s inequality combined with results (C.18), (C.19), 
(C.20), and kl/ r y/\og{k n )B n x sup PeP J[j (Sn p , Pm || ■ || L 2 j = o(a n ) by hypothesis. ■ 

Lemma C.3. Let Assumptions 3.2(i), 3.3(H), 3-4, 4-l(i), and 5. 3(H)-(in) hold. For 
any sequence 5 n | 0 it then follows that uniformly in P & Pq we have 


inf inf ||W n , P p(-, Oo) * Qn n + VnPp(-, 0 o + 

e 0 ee 0n (P)nR J^£v n (e 0 ,5 n ) 

= inf inf ||W„,pp(-, 0 O ) * q„ n + y/nPp(-, 9 0 + 

e o e0 o „(P)nP ^=eV n (0 o ,Sn) 



* 9n n |ls„(P),r 

* Jn n \\t n ,r + °p( a n) 


PROOF: First note that by Assumptions 3.4(h) there exists a constant Co < oo such 
that max{||£ n (P)|| 0jT ., ||£ n (T , )~ 1 || 0ir } < Cq for all n and PgP. Thus, we obtain 

l|W n , P p(-, do) * qi n + VnPp(-, 9o + ~^=) * qt ||s„, r 

< {Collin - E n (P)\\o,r + 1}||W„,p/?(-, 0 O ) * <t + V^Ppi, 0 0 + ~^=) * ?M| En (P), r 

y/n 

(C.21) 
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by the triangle inequality. Moreover, since 0 G V n (9o, 5 n ) for all Oq, we also have that 


. „ in f inf IIW n ,Pp(-, 0 0 ) * q kn + V^nPp{-, 9 0 + ~^=) * q kn h n (P),r 

6 0 £Qon(P)nR ^eV n (0O,S„) V n 

< Co x inf ||w n pp(', 9o) * <fc + yftiPpi; 0 o ) * qt Hr • (C. 22 ) 

0 oe 0 O n(P)ni? 

Hence, Lemma C.4, Markov’s inequality, and Assumptions 4.1 (i) and 5.3(h) establish 


. inf \\^n,pp(-,9o)*qn n + VnPp(-,9o)*q.. 

0 O £®On(P)nR 


k"n II 
n \\r 


< sup ||W n ,pp(-, 9) * q kn || r + o(l) = O v (k l J r ^J\og{k n )B n J n ) (C.23) 
0 e 0 n n R 

uniformly in P G Po- Therefore, (C.21), (C.22), (C.23), and Assumption 5.3(iii) imply 

h 


inf 


inf 


6 0 €Oon(P)nR -h= e v n (0 o ,S n ) 


i,PP(‘, #o) * Qn + VnPpi'i % + —/=) * Q, 


n 


kn 11 „ 
n II S„,r 


< inf 


inf 


e 0 ee 0 n(P)nR ^ev„(e 0 ,5„) 


h 


,,pp(-,0o)*qn +VnPp(-,9o+-^)*qn n \\-£ n (P), r +Op(a n ) 


n 


(C.24) 


uniformly in P G Po- The reverse inequality to (C.24) follows by identical arguments 
but relying on Lemma B.3 implying ||S n || 0jf . = O p ( 1) and ||S“ 1 || 0]r = O p ( 1) uniformly 
in P G P rather than on max{||E n (P)|| 0ir , ||£ n (P) _1 || 0ir } < Cq. ■ 

Lemma C.4. If Assumptions 3.2(i) and 3. 3(H)-(Hi) hold, then for some Kq > 0, 

sup E P [ sup ||W n ,pp(-, 9) * q k n ri || r ] < Kq k}J r ^\og(k n )B n J n . 

PeP 6»e0„n R 


PROOF: Define the class Q n = {f{x)qk, r '■ f £ Pro 1 < j < J , and 1 < k < fc nj }, 
and note ||a|| r < d 1 f r ||a|| 0 o f° r an y a G R d implies that for any P G P we have 

E P [ sup ||W nj p/)(-, 9) * q kn \\ r ] < k)J r Ep[ sup |W n ,pg|] 
e&e n nR geGn 

/ o° - 

v /logJV [] (e/2 ,fi?„,||-|| L 2 )de} , (C.25) 

where the final inequality holds for any go G Q n and some Ci < oo by Corollary 2.2.8 in 
van der Vaart and Wellner (1996). Next, let G n [y) = B n F n (v ) for F n as in Assumption 
3.3(h) and note Assumption 3.2(i) implies G n is an envelope for Q n . Thus [— G n ,G n ] is 
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a bracket of size 2||G' n || L 2 j covering Q n , and hence the change of variables u = e/2 yields 


yAogiV[](e/ 2 , G n , || • \\ L 2 p )d € 

f~\\Gn \\ L 2 , - 

= 2 J P yj 1 +logN[](u,Q n , II • II L 2 p )du < C 2 \/\og(k n )B n J n , (C. 26 ) 

where the final inequality holds for some C2 < 00 by result (B.28) and N^(u,Q n ,\\ ■ \\l 2 p ) 
being decreasing in u. Furthermore, since £p[|W ni p<7o|] < IIsoIIl^ < ||G n || L 2 p we have 

f\\Gn \\ L 2 , - 

Ep[\W ntP g 0 \]<\\G n \\ L 2 p < J yjl + logN[](u,G n , || • \\ L 2 p )du . (C. 27 ) 

Thus, the claim of the Lemma follows from (C.25), (C.26), and (C.27). ■ 

Lemma C. 5 . Let Assumption 3.2(H) hold. It then follows that there exists a constant 
C < 00 such that for all P £ P, n > 1, 1 < j < J, and functions f E L 2 p we have 

kn,j 

'E(f,<lk,n J )h p < CEpKEpifmz^}) 2 } . (C. 28 ) 

k =1 



PROOF: Let L 2 p (Zi tJ ) denote the subspace of L 2 p consisting of functions depending on 
Zi , 3 only, and set P 2 (N) = {{c fc }^ =1 : c k E R and ||{c fc }||^( N ) < 00}, where ||{c fc }|| 2 2(N) = 
c k- F° r an y sequence {c k } E £ 2 (N), then define the map Jp, n ,j ■ ^ 2 (N) —> L 2 p (Zi tJ ) by 

k 

nj n,j 

Jp,n,j{Ck)(z) = ^ ' CkQk,n,j(z') . (C.29) 

fc=l 

Clearly, the maps Jp.n.j : -P 2 (N) —> L 2 p (Zi i3 ) are linear, and moreover we note that by 
Assumption 3.2(h) there exists a constant C < 00 such that the largest eigenvalue of 
Ep/ln'j 1 {Zi.^cin’if (Zi.j)'] is bounded by C for all n > 1 and PsP. Therefore, we obtain 


sup sup || Jp t n tj || 2 = sup sup sup || Jp,n j{cfc} II \ 2 

PePn>l p GPn>l{ Cfe };^ fcC 2 = i 




= sup sup sup Ep[(Y]c k qk i n,]{ z i 1 3)) 2 }< sup CVc| = C (C.30) 

PePn>l {cfc }: EfcC 2 = l k=1 {cfe}:EfcC 2 =1 k=i 

which implies Jp, n , j is continuous. Next, define Jp nj : L 2 p {Zi t f) — > P 2 (N) to be given by 


Jp,n,]9 — { a k(g)}kLl 


( \ J (fl 1 ! Qk,n,j}L 2 p (Zi j) ^ k — kn,j 
a k{9) — \ 

[ 0 if k > k nj 


(C.31) 
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and note J Pn is the adjoint of Jp, n ,j■ Therefore, since || Jp, n ,j\\o = ||</p nj || 0 by Theorem 
6.5.1 in Luenberger (1969), we obtain for any P E P, n > 1, and g E L 2 p (Zi tJ ) 


kn,j 


k =1 


(. 9, Qk, 




= IIA 


P,n,j9 ll^ 2 (N) — 


< II Jt 


\ 2 o\\g\ 


2 


= II JP, 


raj II o 


2 

L 2 P (Zi,j) ■ 


(C.32) 

Therefore, since Ep[/(Pj)<?fc,nj(^j)] = Ep[£p[/(PA^jkMj(^j)] for any / E Lp, 
setting g(Zij) = ^p[/(Vi)|Zjj] in (C.32) and exploiting (C.30) yields the Lemma. ■ 


Appendix D - Proofs for Section 6 

Proof of Theorem 6.1: First note that Lemma D.l implies that uniformly in P E Po 

U n (R)= inf inf l|W* p/?(-, 9) * + D nj p(0)[/i]|| Sn(P) r + o p {a n ) . (D.l) 

eee n nR ^=ev n (e,e n ) 

Thus, we may select 0 n E 0 n H R and h n /y/n E V n (6 n . £ n ) so that uniformly in P E Pq 


Un{R ) — ||W* i pp(-, § n ) * qn' 1 + ®n,p(^ra)[^ra] ||s„(P),r + °p( a n) • (D.2) 

To proceed, note that by Assumptions 5.3(i) and 6.6(ii)-(iv) we may select a 5 n so that 
<5 n S n (B,E) = o(r n ), 1{ K f V K g > 0}<5„5„(B, E) = o(l), 77„ + v n T n = o{5 n ), and 

t n b n x S n (B, Ei)l{Kf > 0} = o(a n n~^) (D.3) 

P-m^n^n X *^n(B,E) — o((l n Tl 2 ) (D.4) 

k\[ r \J\og{k n )B n x sup J[]{6n p , F n , || ■ || L : 2 ) = o(a n ) . (D.5) 

PeP p 

Next, notice that Theorem 4.1 implies that there exist # 0 ™ E ©o n(P) © R such that 

IIA - ^0n11 e = O p (lZ n + u n T n ) (D.6) 


uniformly in P E Po- Further note that since ll^rijHi,^ 5; Bn for all 1 < k < k nj by 
Assumption 3.2(i), we obtain from Assumption 5.2(i), result (D.6) and TZ n +v n T n = o(5 n ) 
that with probability tending to one uniformly in P E Po we have 

Ep[MXi, e n ) - p(Xi, d 0n )\\lql n ,{Z h3 )\ < BlKffio . (D.7) 

Hence, letting Q n = {f(x)qk, n ,j(zj) ■ f £ Bn- 1 < J < J, and 1 < k < k nj }, we obtain 
from ||E n (P)|| 0)T . being uniformly bounded by Assumption 3.4(iii), results (C.18)-(C.20), 
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Markov’s inequality, and 5 n satisfying (D.5) that uniformly in P E P 

l|W IM; 0 n ) * qfr - W * >P p(; e 0n ) * cfc || En( p )>r 
< WXn^WcrJkl/’- sup JW^I-W^I = o p {a n ) . (D.8) 

gi,g2&Qn-\\gi-g2\\ T 2 <B n K p 5n p 

p 

Similarly, since 6 n E (0o n (P) Hi?) 6 with probability tending to one uniformly in P E Po 
by Lemma 4.1, we can exploit Lemma D.3 to obtain for some C < oo that 

\\^,p(0on)[h n ] -^>n,p(O n )[hn]\ |s n (P),r < || Era(P) ||o,r X CK m \\§ n - 0 On ||l || ^||e + O p (a n ) 
< ||E n (P)|| 0 , r x CK m K b S n (L,E)S n £ n y/n + o p (a n ) = o p (a n ) (D.9) 

where the second inequality follows from ||L n /\/^||B < £n due to h n /y/n E V n (0 n ,£ n ), 
||/in||E < Kb\\hn\\B by Assumption 6.1 (i), and TZ n + v n r n = o(5 n ). In turn, the hnal 
result in (D.9) is implied by (D.4) and ||E n (P)|| 0)7 . being uniformly bounded due to 
Assumption 3.4(iii). Next, we note that (D.6) and lZ n + p n r n = o(5 n ) imply 

114 - 4n11B = O p (5 n X <S n (B, E)) (D.10) 

uniformly in P E Po- Thus, since <5 n 5 n (B,E)l {/\f V K g > 0} = o(l), <5 n 5 n (B,E) = 
o(r n ), and limsupjj^.^ £ n /r n l{K g > 0} < 1/2 by Assumption 6.6(iii), we obtain 

r n > (M g 5 n s n { B, E) + K g S 2 n Sl(B, E)) V 2(4 + S n S n (B, E))l {K g > 0} (D.ll) 

for n sufficiently large. Hence, applying Theorem E.l and exploiting Assumption 6.1 (ii), 
and ||/i||e < .A&||/i||b for ad h E B n and P E P by Assumption 6.1 (i), we obtain that 
there is an M < oo for which with probability tending to one uniformly in P E Po 

inf H-^ - -tMb < M£ n (£ n + 6 n S n (B,B))l{K f > 0} . (D.12) 

^eVn(0o„,2 K b £ n ) V n V n 

In particular, it follows from Assumption 6.6(h) and (D.3) that we may find a ho n /y/n E 
V n (0on,2Kb£n) such that \\ho n — h n ||b = o p (a n ) uniformly in P E Po, and hence As¬ 
sumption 3.4(iii), Lemma D.3, and ||/i||e < A/,||/i||b by Assumption 6.1 (i) yield 

||® rl ^p(0On)[^'n] ^n,p{0Q n ) [/lOn] l|s n (P),r — ||E n (P) || 0 ,r X CM m \\h 4n||E — Op(®n) 

(D.13) 
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uniformly in P E Pq. Therefore, combining results (D.2), (D. 8 ), (D.9), and (D.13) 
together with 9 0n E ©o n(P) © R and hon/y/n E V n (9 0n , 2 K b £ n ) imply 


U n (R) — \\^n,pp {'1 ^On) * Qn 1 T HJn,P(^On) [hon] ||E„(P),r T Opifln) 

— inf d inf ll W n,PP('^)*^ n + 1D) n,p(6 , )[/l]||s„(P),r + Op(a n ) 

060o„(P)nP h eVn{e> 2K b e n ) 


(D.14) 


uniformly in P 6 Po, thus establishing the claim of the Theorem. ■ 

Proof of Theorem 6.2: First set G = R, T G (0) = ~~ 1 for all $ E B, and note that 


R = {d E B : T F (0) = 0} = {9 E B : T F (0) = 0 and X G (0) < 0} . (D.15) 


Moreover, also note Assumption 6.2 is automatically satisfied with K g = M g = 0 and 
VX G (0)[/i] = 0 for all 9,h G B, while Assumption 6.4 holds with ho = 0 and e = — 1. 
Similarly, since I\ g = 0, definition (68) implies G n {9) = B„ for all 9 G B, and hence 

{ — — £ B n : —— G G n (9), T p{9 H— -j=) = 0 and || —^=|| B © £n\ 

\Jn yjn yjn y/n 

= 4gB„: T f (0 + -£=) = 0 and ||i| B < 4} • (D.16) 
y/n yjn \Jn 

Furthermore, since (D.16) holds for any r n , we may set r n so Assumption 6.6(iii) holds. 
Thus, it follows that we may apply Theorem 6.1 to obtain uniformly in P G Pq 


Un(R) > inf inf l|W* pp(-,0) * q„ n + D n p(0)[/r] ||s n( p) r + o p (a n ) . 

060 O n(P)np -±=ev n (6,2K b e n ) 

(D.17) 

Next, note that since (£y/ r y/\og(k n ) J n B n / yfn + ( n ) = o(r n ) by hypothesis, we have 


liminf inf P(Qo n (P) ©PC 0 n n R) = 1 

n—too Pe Pq 


(D.18) 


by Lemma 4.1(h). For notational simplicity define 77“ 1 = <S n (B, E), and then note that 
||/j||b < l n fo r an y h G B n satisfying ||/i||e < f] n £ n . Thus, we obtain by definitions of 
V n (9,£) and V^(0, t) that for any PgP and 9 G 0 n © R we have 


V n (9, r] n i n ) = {-^= GB n : 9 + -^= <E Q n n R and 

' n y/n 


h 


n 


C {—= G B n '■ X f(9 H— 7 =) = 0 and 

-i/n y/n 


© fjn^n } 


4=||b <M = ^n(0,4) ■ (D.19) 


n 
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Therefore, Lemma D.l and results (D.18) and (D.19) imply that uniformly in P G Pq 


U n (R)< inf inf ||W* pp(-, 6) * g„ + D n p(6 ) )[/r]||s n( p ) r + o p (a n ) . 

eee 0n (P)nR ^ eVn (e, Vn i! n ) 

(D.20) 

Furthermore, we also note that the definition of 5 n (B,E) and Assumption 6.1 (i) yield 


b < <S n (B, E) x ||h|| E <5 n (B,E)xiL 6 ||h|| B , 


(D.21) 


for any h G B„, which implies «S n (B,E) > 1/Kb, and thus rj n = 0(1). Hence, since 
TZ n S n ( B,E) = o(£ n ), we have lZ n = o(£ n r] n A£ n )- Similarly, Assumption 6.6(h) implies 
kh /r yJ\og{k n )B n sup PgP J[]{C P V {£ n r\n) Kp ,K n , || ■ \\ L 2 p ) = o{a n ) and K m (£^ V x 

«Sn(L, E) = o(a n n~ a). Thus, applying Lemma D.4 with £ n = £ n and £ n = l n r\ n yields 


inf 


inf 


e&e 0n (P)rR -%ev n {0, Vn e n ) 


hP p{-,6) *q* n +D ni p(0)[h]|| STi ( P ), 


= , n in /m p , inf I!K,pP(-. 0) * q/T + Dn,p 

e£e 0n (P)nR h ev n {6,2K b e n ) 


i(p),r T Op(cin') (D.22) 


uniformly in P G Pq. Hence, results (D.17), (D.20), and (D.22) allow us to conclude 

3 p(-, 0) * g^ n -p D n) p(6*)[h]|| Sri (p) r + o p (a n ) 


O n (i?)= inf inf ||W* PA 

0e0on(P)nP^ e v n (6(,2J£' 6 ^„) 


(D.23) 


uniformly in P G Po, which establishes the first claim of the Theorem. 

In order to establish the second claim of the Theorem, we first define the set 

N n (9,£) = {-^= G B n : Vr F (d)[h] = 0 and ||-^=|| B < £} ■ (D.24) 

y/n y/n 

Next, note that since ©o n(P) © R = {$on(P)}, Theorem 4.1 yields uniformly in P G Po 

dp(©n © R, Oo n (P), || • ||e) = © R , e 0n (P), || • || E ) = O p (TZ n + v n T n ) . (D.25) 

Furthermore, since (7L n + 7v n T n ) x «S n (B, E) = o{£ n ), Assumptions 5.3(i) and 6.6(h) imply 
there is a 5 n | 0 satisfying (D.3)-(D.5), TZ n + u n T n = o(5 n ), and 

$n x <S re (B,E) = o(£ n ) . (D.26) 
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Moreover, identical arguments to those employed in (D.8), (D.9), and (D.10) yield 

sup ||W* )P p(-, 6) * cfc - W * tP p(; d 0n (P )) * q k n " || Sn(P)ir = o p (a n ) (D.27) 

eee„rR 

sup sup W® n ,p(0)[h\ -B n) p(6»o n (P))[/i]|| Sn ( P ) ir . = o p (a n ) (D.28) 

eee„n.R^et>„(0,M 

sup \\d - 0 On (P)\\ B x {5 n (B,E)}- 1 = o p (S n ) (D.29) 

6»e0nn R 

uniformly in P E Pq. Therefore, we can conclude from Lemma D.l and results (D.27) 
and (D.28) that we may select a 9 n E © n n R so that uniformly in P E Pq 


U n (R) = inf ||W* pp{-, 0 n ) * q kn + I4,p(4)[fr] ||s n (P),r + o p (a r 


= inf 


=ev„(0„/ n ) 


pp(; 6on(P)) * Qn n + V>nX0On(P))[h]h n (P),r + o p (a n ) . (D.30) 


We next proceed by establishing upper and lower bounds for the right hand side of 
(D.30). To this end, note that by (D.30), we may select a hi n /y/n E V n {0 n ,4) so that 

U n {R ) = ||K,PP(-, °0n(P)) * qt + ®nAOon(P))[hln\h n{ P),r + O p {a n ) (D.31) 

uniformly in P E Po- Also observe that the final equality in (D.19), result (D.29), and 
Lemma E.l (see (E.29)) imply that there exist M < oo and hi n /yjn in N n (9o n (P), 24) 
such that with probability tending to one uniformly in P E Po we have 

||— 7 = — ~t=\\b < M x 4(4 + 5 n x <S n (B, F,))l{K f > 0} . (D.32) 

yn V n 

Thus, we obtain from (D.32), \\hi n -hi n \\ B < K b \\hi n -hi n \\ B , ||E n (P)|| 0)r being uniformly 
bounded by Assumption 3.4(iii), and Lemma D.3 that uniformly in P E Po 

\\^n,p{9on(R))[hln] ~ I4,p(#0n(P)) [4n] ||s„(P),r < ||S n (P) || 0 ,r X CM m \\hi n — hl n ||e 

< 4(4 + 4 x «S n (B, E ))y/n\{Kf > 0} + o p (a n ) = o p (a n ) , (D.33) 

where the final equality follows by (D.26) and > 0} = o(a n ) by Assumption 

6.6(h). Hence, (D.31), (D.33), and hi n j\fn £ N n (0on(P), 24) yield uniformly in P E Pq 


U n (R) = \\Wn )P p(-,0 0n (P)) * q kn +B n ,p(4n(P))[^n]||s„(P),r + o p {a n ) 

> „ inf ||W; i pp(-,4n(P))*^+O ni p(4n(P))Mlls n (P),r + o P (a ri ) . 

^G N n {9 0n (P),2e n ) 

(DM) 
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To establish an upper bound for (D.30), let h un /y/n E A T n (6o n (P), £ n /2) satisfy 


, inf , IIW lpp(; e 0 n(P)) * <t + W>nA 0 On(P))[h) || S„(P),r 

N n (e 0n (P),f) 

= 11^4,p4"> @0n(P)) * Q n n + H4,p(4n(P)) [h un ] ||s n (P),r + °p( a n) (D.35) 

uniformly in P E Po- Next note that by Lemma E.l (see (E.30)), the final equality in 
(D.19), and (D.29) we may pick a h un /\fn E 4(4,4) such that for some M < oo 

ife - ^||b < M x 4(4 + 4 X S n (B,E))l{A, > 0} (D.36) 

\Jn \Jn 

with probability tending to one uniformly in P E Po- Therefore, exploiting Lemma 
D.3, ||54(P)||o,r being uniformly bounded by Assumption 3.4(iii), result (D.36), and 
II4m - ||e < Aft||4m -4m ||b implies that uniformly in P E P 0 

Ph n ,p(4n(P))[4m.] — ®n,p(4?i(P))[4n]||s n (P),r < ||4i(P) ||o,r X CM m \\h un — 4m||E 
< 4(4 + 4 x <S„(B, E)) 1 /nl{A7 > 0} + o p (a n ) = o p (a n ) , (D.37) 

where in the final equality we exploited £^l{Kf > 0} = o(a n n -1 / 2 ) by Assumption 
6.6(ii) and 4 satisfying (D.26). Hence, we conclude uniformly in P E Pq that 


U n {R ) 5: l|W* p/5(-, #0n(P)) * Qn n + ®n,p(4n(P))[4m] ||E n (P),r + °p( a n) 

= inf ||W* iP/ 9(-,4n(P))*^+On,p(4n(P))[4lls n (P),r + 4(«n) 

^SiV n (0On(P),4) 


(D.38) 


where the inequality follows from (D.30) and h un /y/n E 4(4,4), while the equality is 
implied by results (D.35) and (D.37). 

Finally, we obtain from results (D.34) and (D.38) together with Lemma D.5(i) that 


Un(R) = 


he B 


inf 

,.rW(VT40or 


(P))) 


n,PA 


Pi'i 4 n{P)) * Qn™ + ®n,p(4n(P)) [4 ||s n (P),r T O p (d r 


(D.39) 

uniformly in P E Po- Setting V nj p = {n = 54(-P)I4,p(4n(P))[4 for some h E B„ n 
AA(VTp(0 On (P)))}, then note that V ni p is a vector subspace of R fcn by linearity of 
I4,p(4n(P)) and its dimension for n sufficiently large is equal to c n = dim{B„ n 
AT(VTp(4n(P)))} by Lemma D.5(ii) and E n (P) being full rank by Assumption 3.4(iii). 
Letting Z n E R fc ” denote a standard normal random variable, we then obtain from 
r = 2, E„(P) = {Varp{p(Aj, 4n(P))9n"(4)}} _ 2, and (D.39) that uniformly in P E Po 


Un(R) 


inf 

V£Vn,P 


||Zn 


I 2 + o p (a n ) — {Aj. jj _ Cn } 2 + o p (a n ) 


(D.40) 
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where the final equality follows by observing that the projection of Z n onto V nj p can be 
written as ITpZ n for some k n x k n idempotent matrix lip of rank c n . ■ 

Proof of Lemma 6.1: First, let 9 n £ 0 n n R and h n £ V n (9 n , +oo) be such that 


mf inf ||W n p(-,0) *qt +D n (0)[/i]|| f . 

9ee n nR^GV„(6,+oo) 

= ||W n p(-, e n ) * qt + Bn(4)[Mls n , r + °K) • (D.41) 
Then note that in order to establish the claim of the Lemma it suffices to show that 

limsup sup P(|| —p=||b > t.n) = 0 . (D.42) 

n—o PePo V n 

To this end, observe that since 0 £ V n (9, +oo) for all 9 £ 0 n n R, we obtain from the 
triangle inequality, ||£ re || 0)J . = O p ( 1) by Lemma B.3 and Assumption 6.5 that 


n(Pn)[hn\ || 

i; 2||S n |L r 


£n,r - 


>p(;9n)*qk+®n(9 n )[hn\\\ 


S«,r 


+ 


tp(') 9 n ) * Qn' 1 Hr T o(®n) 5; 2)1X1^ 


iPi'i @n) * Qi 


kn 11 ^ 
n llS n ,r 


n,PP('> @ n ) * Qn II* - 4“ °p( a n ) 

(D.43) 


uniformly in P £ P. Hence, since Q n n R C 0 n n R almost surely, we obtain from result 
(D.43), ||£ n ||o,r = O p ( 1), and Lemma C.4 together with Markov’s inequality 


< 2||S n || 0jr x sup ||W* P p(-,0) *q %"|| r + o p (a n ) = O p (k i n /r ^log(k n )B n J n ) (D.44) 
0ee n np 


uniformly in P £ P. Moreover, note that since Q n <lR C (0 O „,(P)nI?) E with probability 
tending to one uniformly in Po by Lemma 4.1, and h n /y/n £ V n (9 n , +oo) implies h n £ 
y/n{ B n n R — 9 n } we obtain from the first hypothesis of the Lemma that 


limsup sup P{l n < || — =||b) 

n—>oo Pg Po \ n 


hr 


= limsup sup P(£ n < ||—^IIb and \\h n \\ E < Vn\\®>n,p(On)[h n ]\\r) 
n— »oo PePo v n 

< limsup sup P{l n < || —11b and ||h n ||E < 2i/ n ||lD) re (0 n )[ft n ]|| r ) , (D.45) 

n-> oo PePo 


n 
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where the inequality follows from (74). Hence, results (D.44) and (D.45), the definitions 
of <S n (B, E) and lZ n , and <S n (B, E)7L n = o(£ n ) by hypothesis yield 


limsup sup P(£ n < ||—(L||b) 
n->oo PePo V n 

< limsup sup P(£ n < 2^=S n (B,E)\\O n (O n )[hn]\\r) = 0 , (D.46) 
n —>oc PeP 0 V n 


which establishes (D.42) and hence the claim of the Lemma. ■ 

Proof of Theorem 6.3: We establish the first claim by appealing to the first claim of 
Lemma D.6. To this end, we note condition (D.97) holds by Assumption 6.7 and define 


KA R ) = a A n L 


inf 


,pp(-,0) +Bn ) p(0)[/l]||E B (P),r , (D.47) 


6 »e©on(P)nP ^ £F„(0,2 K b e n ) 

which is independent of {Vi}^ =1 by Assumption 6.5. Moreover, Theorem 6.1 yields 


Un(R)>Ul P (R)+o p (a n ) (D.48) 

uniformly in P G Po, while Assumption 6.6(h) implies K m {2K^l n ) 2 x <S n (L, E) = 
o(a n n~i) and kl/ r y/\og(k n )B n x sup PgP J[]((2K b £ n ) K p,P n , || • H^) = o(a n ), and hence 

In(R)< a inf inf \\W n pp(-, 9)*q^ n +]D n p(9)[h] || Sn( p) r +o P (a n ) (D.49) 

0e@on{P)r\R *ev„(0,2K b £ n ) 

uniformly in P G Po by Theorem 5.1 (i). Since the right hand side of (D.49) shares the 
same distribution as U* P (R), the first claim of the Theorem holds by Lemma D.6(i). 

For the second claim of the Theorem we first note that Theorem 6.2 (i) yields that 


U n (R) = U* tP (R) + Op(a„) 


(D.50) 


uniformly in P G Po- Furthermore, as already argued 2 Kb£ n satisfies the conditions of 
Theorem 5.1 (i) by Assumption 6.6(h) while TZ n = o(£ n ) in addition implies that 


i(R) = inf 

0e0 On (P)nP 


inf 

=&V n {0,2K b e n 


n,PP{'i @)*Qn n T®n,p($) [h] ||s n (P),rT°p( a n) (D.51) 


uniformly in P G Po (see (C.12) and subsequent discussion). Hence, since the right 
hand side of (D.51) shares the same distribution as U* P (R ) and condition (D.97) of 
Lemma D.6 holds by Assumption 6.7, the second claim of the Theorem follows from 
results (D.50) and (D.51), and Lemma D.6(ii). 

In order to establish the final claim of the Theorem, we next note that since Q n nR C 
0 n n R and 0 G V n (8 ,£ n ) for all 6 G 0 n n R it follows from Assumption 6.5 and 
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S n ||o,r = O p ( 1) uniformly in P E P by Lemma B.3 that we must have 


limsup sup P(U n (R) > Mk\[ r yJ\og(k n )B n J n ) 

n —>oo Pa P 

< limsup sup P( sup ||W„ ) pp(-, 6) * q * n \\g > Mk}J r 0og (k n )B n J n ) 
n^too pa p eae„m 

= limsup sup P( sup \m pP (;9)* q t\\ t r > Mk}J r yJ\og(k n )B n J n ) . (D.52) 

n—>oo PaP 6a@„nR 


Therefore, (D.52), Markov’s inequality, and Lemmas B.3 and C.4 allow us to conclude 


limsup limsup sup P(U n (R) > Mk}J r \J\og(k n )B n J n ) = 0 . (D.53) 

AP[oo n—>oo Pa P 


We thus obtain from the definition of c nj i_ a , result (D.53), and Markov’s inequality 


lim sup lim sup sup P(c nj i_ a > Mk\[ r y/log( k n ) B n J n ) 

M'loo n-»oo PaP 

= limsup limsup sup P(P(U n (R) > Mk n y/log(k n )B n J n \{Vi}™ =1 ) > a) = 0 . (D.54) 

Mtoo n-s>oo PaP 


Next observe that ||a|| r < ||S n 1 || 0)r ||a||f. r for any a E R fcn , 
obtain for some Z Ut p E R+ satisfying supp gP Ep[Z U) p\ = 

In(R) > V^IIS- 1 !!-, 1 x M \\E P [p(X i ,e)*q k n HZ i )}\\r 


and hence by Lemma B.2 we 
0(kn r A/log (k n )J n B n / ,/n) 

- y/n\\t n \\ 0tr Z n: p . (D.55) 


i k 

Moreover, assuming without loss of generality that qnj is the || • projection 

of Ep[pj(Xi, 0)\Zij] onto the span of q^f we obtain by Lemma B.4 that 


inf ■ jEpIp(X i ,0)*qZ"(Z i )]}\ r 

i_i J 

> 5 „ inf' r (E pMU • (D.56) 

f/GvJnll it 

3 =1 

Thus, since the eigenvalues of Ep[q^f {Zi tJ )qn™j 3 are uniformly bounded away from 
zero by Assumption 6.8(ii), we can conclude from result (D.56) that 




J 


inf {W 

Pi nn 


qn,j Ttn,P,j 


3= 1 


J 


> 


k l ~\^J^\\Ep\p J {X u e)\Z ij \\\^}-l^ *K 0 k.-T (D.57) 

u£ky\\ri ^ 


J=1 


where the second inequality must hold for some Kq < oo by Assumption 6.8(i) and 
@n n R C 0nk. Hence, results (D.55) and (D.57) imply that for M > K$ and some 
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£q > 0 it follows that for any P G Pi jn (M) we must have 


In(R) > 11 1 11 o,r e oM k]/ r yj log ( k n ) J n B n - Vn\\t n \\ 0t rZ nt p . (D.58) 

Thus, since (D.58) holds for all P G Pp n (M) with M > Kq, Pi <n (M) C P implies that 
inf P(I n (R ) > Cn,l-a) 

PePi,„(M) 

> inf P(e 0 M\\t~ 1 \\flk// r y/log(k n )J n B n > c n ,i_ Q + Vn||S n || 0ir Z nj p) 
PePi.„(M) 

> m^P(e 0 M\\t~ 1 \\-*kl/ r ^/log(k n )JnB n > c n ,i_ Q + y/n\\Z n \\o, r Z ny p) . (D.59) 

In particular, since: (i) max{||S n || 0ir ., ||S“ 1 || 0)7 .} = O p ( 1) uniformly in P G P by Lemma 
B.3, (ii) Z U: p = O p (kn J y/log(k n )J n B n /y/ri) uniformly in P G P by Markov’s inequality 
and sup PeP E P [Z n<P \ = 0{kn' y/\og(k n )B n J n / y/n) by Lemma B.2, and (iii) c n ,i_ Q = 
O p (kl/ r Y^log( k n ) J n B n ) uniformly in P G P by result (D.53), it follows from (D.59) that 

liminfliminf inf P(I n (R) > c n i_ a ) = 1 , (D.60) 

Mfoo re-»oo PeP 1>n (M) V V ' ’ ' V ; 

which establishes the final claim of the Theorem. ■ 

Lemma D.l. Let Assumptions 3.1, 3.2(i)-(ii), 3.3, 3-4, 4-1, 5.1, 5.2(i), 5.3(iii), 5-4(i), 
6.1, 6.5, and 6.6(i)-(ii) hold. It then follows that uniformly in P G Po we have 

U n (R)= inf inf \\Wlpp(-,e)*qt+OnA^M^(nr + Op(an)- (D.61) 

9ee n nR^ev„(e,e n ) 


PROOF: For an arbitrary e > 0, observe that Lemma 4.1 and Assumption 6.6(i) imply 
liminf inf P(@ n HR C (0 On (P) n R)%) = 1 . (D.62) 

n-t oo PeP 0 

Furthermore, for any 0 G 0 n Hi? and h/y/n G V n (9 ,£ n ) note that T<3(0 + /i,/i/n) < 0 and 
Tp(@ + h/y/n ) = 0 by definition of V n {9,£ n ). Thus, 0 + h/y/n G i? for any 0 G 0 n n i? 
and h/y/n G V n {9,£ n ), and hence result (D.62) and Assumption 6.1(h) yield 

liminf inf P(0 H - -= G 0 n H R for all 9 G 0 n H R and —p= G V n (6, £ n )) 

oo p e p 0 yTi yn 

= liminf inf P(9 -\ -— G 0 n for all 0 G 0 n H i? and —G V n (9,£ n )) = 1 (D.63) 

n—too PeP 0 yn yn 
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due to HVv^Hb < 4 I 0 for any h/y/n G V n (0, £ n ). Therefore, results (D.62) and (D.63) 
together with Assumption 6.6(h) and Lemma D.2 yield that uniformly in P G Po 

sup sup \\B n (0)[h]-B>n y p(d)[h\\\r = o p (a n ) . (D.64) 

0ee„n r ^ev n (d/n) 


Moreover, since O n n R C 0 n n R almost surely, we also have from Assumption 6.5 that 

sup ||W np(-,0)*qn n -yVn,PP(-,6)*Qn n \\r 
e&e n nR 

<Jx sup ||W„/^™ - W* p/g^ n || r = o p (a n ) (D.65) 

uniformly in P G P. Therefore, since ||E n || 0iJ . = O p ( 1) uniformly in P G P by Lemma 
B.3, we obtain from results (D.64) and (D.65) and Lemma C.l that uniformly in P G Po 

U n (R)= inf inf \\Wl P p(-,e)* q ^+0 7h p(e)[h}\\ tnr + o p (a n ) . (D.66) 

6ee„nR -A=6K(0,4O 

Next, note that by Assumption 3.4(iii) there exists a constant Co < oo such that 
||^n(P)~ 1 \\o,r < Co for all n and P G P. Thus, we obtain that 

\\wi P p(-,e)*qt+®nXe)Mk n ,r 

< {Co\\£n - S n (P)|| 0 , r + l}||W* jP p(-,0) * + B„,p(0)[/i]|| En(P)ir (D.67) 

for any 6 G 0 n 0 R, h G B n , and P G P. In particular, since 0 G V n (6,£ n ) for any 
0 G O n n R, Assumptions 3.4(iii), 5.3(iii), Markov’s inequality, and Lemma C.4 yield 

||Sn-S n (P)|| 0 , r x inf inf \\W* nP p(;d)*q^+ID nt p(d)[h}\\ t 
0e0„nR ^ev„{6/ n ) 

< II£„ - Sn(P) llo.r X sup ||W* P p{-,6) * q^ 1 ||e„(P) ,r = °p( a n ) (D.68) 

e»e©nni? 

uniformly in PgP. It then follows from (D.67) and (D.68) that uniformly in P G P 

inf inf ||W* P p(-,0) * + B n> p(0)[/i]|| £ 

0e0nn R^ev n (e,e n ) 

< mf inf \\Wl P p(-,e)*qt+®nAm]h n (P),r+o P (a n ). (D.69) 
0e©„n R^eV„(e/ n ) 


The reverse inequality to (D.69) can be obtained by identical arguments and exploiting 
max{||E n || 0 . r , HE- 1 !!^} = O p { 1) uniformly in P G P by Lemma B.3. The claim of the 
Lemma then follows from (D.66) and (D.69) (and its reverse inequality). ■ 
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Lemma D.2. Let Assumptions 3.2(i)-(ii), 5.1, 5.2(i), 5-4(i), 6.1(i) hold, and define 

D n (9) = G B n : 9 H—G 0 n n R and ||^=|| B <4} • (D.70) 

v n V n y Ti 

If I 0 satisfies kl/ r ^log (k n )B n x sup PeP J[]{Ci P ,F n , II ■ \\l 2 p ) = o(a n ) and K m £ 2 x 
5 n (L, E) = o{a n n~ 2 ), then there is an e > 0 such that uniformly in P G P 

sup sup \\On(9)[h\-B n}P (9)[h\\\r = o p (a n ) . (D.71) 

0C.(C-) UT ,(/>)”«)« h e /; r ,( 0 ) 


PROOF: By definition of the set D n (9), we have 0 + h/^/n G 0 n fl R for any 0 G 0 n 0 R, 
h/y/n G D n (9). Therefore, since IIYv^Hb < 4 for all h/y/n G D n {9) we obtain that 

sup sup HBnWII/l] - V / ^{-Pp(-,6'+ -^=) * g£n - Pp(-,6) *Qn U }\\r 

eee n nR Ji=£D n (e) v n 

< sup ||G n ,pp(-, 9i) * - G n , P p{-, 0 2 ) * qlffi || r • (D.72) 

01,02£0nOR:||0l— 02 ||b<Ai 


Further note that Assumptions 3.2(i), 5.2(i), and 6.1 (i) additionally imply that 

sup sup Ep[\\p(Xi, 4) - p(Xi, 9 2 )\\ 2 2 ql n j(Z itJ )} < B 2 n K 2 K 2 b Kp ^ . 

P ^1 , ^2 || ^ 1—^2 || 

(D.73) 

Next, let Q n = {/ (x)qk >n ^(z) : f G T n , 1 < j < J and 1 < k < fc nj }, and then observe 
that Assumption 5.1, result (D.73) and ||u|| r < kl/ r ||u||oo for any v G R fcn yield 


sup ||Gn,PP(-,0l) * qn " - Gn t pp(-,9 2 ) * q!ffi ||r 

01,02G0 n nJi:||01-02||B<^n 

<2 Jk l j r x sup |W ni p< 7 i — W n ,pg 2 \ + o p (a n ) (D.74) 

9l,92eGn:\\91-92\\ L 2<B n K p K'fi , & p 

uniformly in P G P. Therefore, from results (C.18)-(C.20), Markov’s inequality, and 
kn r yJ\og(k n )B n x supp g p J[ j (in', J- n , || • || L 2 j = o(a n ) by hypothesis, we conclude 

sup sup \\O n (9)[h]-^/n{Pp(-,9 + -^=)*q l fi l -Pp(-,9)*q'fi l }\\ r = o p (a n ) (D.75) 
6»e0„n R h & p> V n 

x/n v y 
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uniformly in P E P. Moreover, setting e > 0 sufficiently small for Assumption 5.4(i) to 
hold, we then conclude from Lemmas B.4 and C.5, and Assumption 5.4(i) that 


sup sup \\s/n{Pp(-,9 + - / =)*q^-Pp(-,e)*q^ n }-0^p(d)[h]\\ r 

'n 


0e(0 O n(P)ni?) £ ^eDn(e) 


< 


sup 


ee(e 0n (P)nRy ^=eD n (0) 


sup {\fCJKm x y/n x || — 7 = ||e x ||-t=||l} (D.76) 


n 


n 


for some C < 00 . Therefore, since ||/i||e < -^611/*■ 11B for all h E B n and P E P by 
Assumption 6.1 (i), we conclude from K m l ^ X «S n (L,E) = o(o n n _ 5) that 

sup sup || s/n{Pp(-, 0 + ~^=) * q^ n - Pp(-, 9) * q^} - B n ,p(0)[/i] || r = o(a n ) 

ee(0on(P)ni?) e J^ eDn (g) V n 

(D.77) 

uniformly in P E P. Hence, the Lemma follows from results (D.75) and (D.77). ■ 

Lemma D.3. Let Assumptions 3.2(H) and 5-4 hold. Then there is an e > 0 and C < 00 
such that for all n, P E P, #0 €E ©on(D) n i?, #1 E (0Qn(P) H i?) e , and ho, h\ E B„ 


||®n,p(^o)[^o] — ®ra,p(^i)[^i]||r < C{M m ||/i 0 — ^i||e + AT m ||0 o - 9\ ||l|| h\ ||e} • 


PROOF: We first note that by Lemmas B.4 and C.5 there is a constant Co < 00 with 

J 

|»«,p(flo)N-»«,r( 0 i)Nllr < {^Co||Vmp J (0o)N-Vmp J (0 1 )[/ti]||^}l . (D.78) 

j=i 

Moreover, since (ho — hi) E B„, we can also conclude from Assumption 5.4(iii) that 

\\'Vmpj(9o)[ho — hi}\\ L 2 p < M m x \\h 0 — /ii||e • (D.79) 

Similarly, letting e > 0 be such that Assumption 5.4(h) holds, we further obtain 

\\Vm PtJ (9o)[hi} - Vmp J (0 1 )[/ii]|| i 2 ) < K m ||0i - 0 o ||l||Me (D.80) 

due 9i E (@o n(P) H R) e . Thus, the Lemma follows from (D.78)-(D.80). ■ 

Lemma D.4. Let Assumptions 3.1, 3.2, 3.3, 3-4, 4-T 4-%> 5.1, 5.2, 5.3, 5-4(i), 6.5 
hold. If i n , l n satisfy k}/ r ^log(k n )B n sup PeP J[](C P V in p ,P n , || ■ \\l 2 p ) = °(a n ), U n = 
o(l n A In), and K m (i\ V ^)<S n (L, E) = o(a n n 2 ), then uniformly in P G Pq 


. fof „ . inf ll W n,PP(E 0 ) *Qn +^n,p(0)[h]\\z n (P). 

0e© On (P)nP 


= inf 


inf 


0e0 O n(P)nP -h=£.v n {p,in) 


n,PP(."i * Qn n + ®n,P 


,(P),r T o p (a n ) . (D.81) 
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PROOF: For notational simplicity, for any / : @ n n R —> R fcn , and £ € R + define 


TnAf’t) 


inf inf 

oee 0n (P)nR -±=£V n (e,e) 


||/(6») +Bn,p(6 , )M||s„(P),r • 


(D.82) 


Next, note that since W n ^ P and W* P have the same law for every P, it follows that 


P(\T n A^n,PP * <C,4) - T n , P m,pP * c. 4)1 > e) 

= P(|T n ,p(W n , P p * <t,t n ) - T n>P (yV n ,pp * qt,L )| > e) 


(D.83) 


for any e > 0. However, by Lemma 5.1 we also have uniformly in P 6 Pq that 


I n (R) = inf 


inf 


oee 0n (p)nR Ji= e v n (e,e n ) 


i,Pp{-, 0)*qi n + VnPp{-, 0+-^)*Qn n lls„(P),r+Op(On) 


n 


= o r. in /m „ h inf II W n,P/5(-, 0) * + ®n,P^) [/*] ||E„(P),r + °p(a n ) , (D.84) 

0e0 o „(P)nfiA e v„( W „) 


where the second equality follows from (C.10), K m £^ x 5 n (L, E) = o(a n n~ 2 ) by hypoth¬ 
esis, and Lemma C.l. Hence, since the same arguments in (D.84) apply if we employ l n 
in place of £ n , it follows from (D.82) and (D.84) that uniformly in P 6 Po 

T n ,p(Wn,pp*qn n Jn) = I n (R) + o p ( 1) = T njP (W n ,pp * q„ n , £ n ) + Op{a n ) . (D.85) 

Thus, the claim of the Lemma follows from (D.83) and (D.85). ■ 

Lemma D.5. Let Assumptions 2.1, 2.2(i), 3.2, 3.3(H)-(in), 3.4(H)-(Hi), 5-4(i)- 

(ii), 6.1, and 6.3 hold, ©o n(P) © R = {9o n (P)}, R satisfy (71), and define the set 

N n {9,£) = {—j= €. B„ : VT f ( 0 )[A] = 0 and ||^|| B < £} • 

V n V n V n 

Further assume that t n 0 satisfies K m £^S n (L, E) = o(l), £%1 {Kf > 0} = o(n _ 3), and 
lZ n S n (B, E) = o(£ n ). (i) It then follows that uniformly in P E Pq we have 


inf ||K,PP(-,MP)) * <fc +V>nA 80 n(P)mh niP) ,r 

^eN n (e 0n (P),e n ) 

= , „ IIK,PP(-^0n(P))*^"+O„,p(^0n(P))[/l]||E n( P ) ,r + O p (a n ) . 

hEB n r\AJ (V I p\(sQn ))) 


(ii) Forn large, O n ,p(9o n (P)) : B n rW(VYp(0on(-P))) —> R fcn is injective for all P € Pq. 


72 


Proof: To begin, select h nt p/y/n G N n (9o n (P), £ n ) so that uniformly in P G Pq 


inf 

4=eiV„(0On(P)A) 


n,P. 


p(-,0 O n(P)) * +0„,p(MP))[>l]|| E „(P), r 


i,pP(-,0On(P)) * + ®n,p(#On(P))[Vp]||s n (P),r + Op(l) • (D.86) 


Further note that for L n {0Q n {P),2£ n ) as defined in (E.27), Lemma E.l (see (E.30)) 
implies that there exists a h nt p/y/n G L n (0o n (P),2£ n ) for which for n sufficiently large 


I h n ,P hn,P I 


n 


n 


b < M x £ 2 n l{Kf > 0} 


(D.87) 


for some M < oo. Moreover, since T p(9o n (P) + h U: p/yjri) = 0 and P satisfies (71), it 
follows that 9on(P ) + hn^p/^/n G B n n 7?. Thus, we obtain from Assumption 6.1 and 
\\h n ,p/y/n\\-B < 2£ n that for n sufficiently large we have for all FgPq that 


Li.P 


n 


eV n (9 0n (P),2K b £ r 


(D.88) 


In particular, we obtain 9o n (P) + h n ^p/y/n G (&o n {P) n P) e for n sufficiently large, and 
thus from Assumption 4.2 and ©on(-P) Hfi = {$on(-P)} we can conclude that 


1e < MWEp[p(Xi, 9 0n (P ) + * ^(ZOlllr + O(Cn)} 

<^{||D r))P (0o n (P))[^]|| ? . + A m -^ x5 n (L,E) + 0(C„)} , (D.89) 
/n vn 


where the second equality in (D.89) holds by result (C.10). Moreover, also note that 

,p(M^))[^f ] -Bn,p(MP))[^]||r < 4l{A7 > 0} (D.90) 


by Lemma D.3, Assumption 6.1 (i), and result (D.87). Thus, combining results (D.87), 
(D.89), and (D.90) together with v~ l = 0(1) by Assumption 4.2 we obtain that 

" ^IIe < M\PnA^n{P))^]\\r + ^(^«S n (L, E) + l{Kf > 0}) + Cn} (D.91) 

for all P G Po for n sufficiently large. Also note that h n ^p satisfying (D.86) and 
0 G N n {9 0n (P),£ n ) imply together with ||E n (P) || 0jr being bounded uniformly in P by 
Assumption 3.4(h), Lemma C.4, and Markov’s inequality that uniformly in P G Pq 


i,p($0n(P))|An,p] lls n (P),r 

< 2\\W n>P p(;9 0n (P)) * qth n (P),r + o P ( 1) = O p {kl/ r ^^B n J n ) . (D.92) 
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Hence, employing the definition of lZ n (see (47)) and that u n /y/n < 7 Z n , we can conclude 
from Assumption 3.4(iii) and results (D.91) and (D.92) that uniformly in FePq 


^71, P | 


n 


E = O p (Kn{ 1 + K m e 2 n S n { L, E) + > 0}}) = Op(^„) , (D.93) 


where the second equality follows from K m ^S n {h, E) = o(l) and £^l{Kf > 0} = 
o(n _ 2 ) by hypothesis. Therefore, since A. n S n (B,E) = o{i n ) we obtain \\h ni p/y/n \\b = 
o p (i n ) and hence with probability tending to one uniformly in P E Pq 


inf ||W£ iP p(-,fWP)) * Qn n +V>nA0on(P))[h\h n{P) , r 

^eN n (0 On (P),£n) 

inf ||W: iP p(-,0on(B))*^+O n ,p(0 O n(i :, ))[/ l ]|| Sn( p )ir . (D.94) 

^eJV„(0 o „(P)A/2) 

However, since ILn,p($0ri(P)) : B n —>• R fcn is linear, the function being minimized in 
(D.94) is convex. As a result, it follows that whenever (D.94) holds, we must also have 


inf l|W* pp(-, 0 On (P)) * + D„,p(0 O n(i 3 )) W|| Sn( P),r 

^eJV„(e 0 „(P)A) 

inf ||W; i pp(-,0on(P))*g^ l +lD)n,p(0On(E))MI|s„(P),r , (D.95) 

-^eiV„(0 O n(P),+oo) 


and hence the first claim of the Lemma follows from the definition of N n (0, £ n ). 

We establish the second claim of the Lemma by contradiction. Suppose there exists a 
subsequence of {n}^ =1 and sequence {Pj}f : Li Q Po such that O njt p j {0o nj (Pj)) ■ 

B n? nN(Vr F (6 0nj {Pj))) — > R fcrij is not injective, and then note the linearity of the 
map TB>nj,Pj(Qonj(Pj)) implies there exists a h c n . e B rij nA/ r (VTp(0o nj(Pj))) such that 
= 1 and ^nj,Pj (0Onj (Pj )) U^nj ] = 0. Tlien observe that (n ji',, j /yy'nj <E 
W, (0On :j (Pj), ), and that as a result (D.91) also holds with . /i(j /ydtj in place of 


hn.p/y/n. Therefore, since D n . iP j (0o nj (Pj))[hn ] = 0 we can conclude that 


/ 


< 


K, 


Pn,{C.(^5 n ,(L,E) + 1{A> > 0}) + Cn,} = 0(K nj ) 


n 


(D.96) 


where the final equality follows by exploiting the definition of lZ n and the fact that 
A' m £^5 n (L,E) = o(l) and £^l{Kf > 0} = o(n~ 2 ). However, result (D.96) and 
WK/V^We < K b \\h^/y/n\\ B = 1 by Assumption 6.1(i) imply £ nj = 0(U nj ), which 
contradicts lZ n S n ( B,E) = o(£ n ) due to {5 n (B,E)} _1 > 1/A& by (D.21), and hence the 
second claim of the Lemma follows. ■ 

Lemma D.6. Suppose there exists a 5 > 0 such that for all e > 0 and a E [a — 5,a + 5] 


sup P(c n> i_a(P) - e < I n (R) < c n ,i-a(P) + e) < a„ 1 (e A 1) + o(l) . (D.97) 

PeP„ 
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(i) If I n (R ) < U n: p(R ) + o p (a n ) and U n (R ) > U* P (R) + o p (a n ) uniformly in P € P 0 
for some U* P (R ) independent to { VJ;}” =1 and equal in distribution to U n ^p{R), then 

limsup sup P(/ n (P) > cn.i-a) < a . (D.98) 

n—Kx> PePo 

(a) If In(R) = U n ,p{R) + Op{a~ l ) and U n {R ) = P,* p (R) + o p (o“ 1 ) uniformly in P G P 0 
/or some U* P (R) independent to {Vi}f =l and equal in distribution to U nt p(R), then 

limsup sup |P(/ n (P) > Cnp-a) — a| = 0 . (D.99) 

n->oc PePo 


PROOF: For the first claim, note that by hypothesis there exists a positive sequence 6 n 
such that = o(a n ) and in addition we have uniformly in PgPo that 

In(R) < U n ,p(R) + °p(b n ) U n (R) > Uf p(R) + o p (b n ) . (D.100) 

Next, observe that by Markov’s inequality and result (D.100) we can conclude that 


limsup sup P(P(U* P (R) > U n {R) + b n \{V i } 1 f =l ) > e) 
rwoo PePo 

< limsup sup -P(U*p(R) > U n (R) + b n ) = 0 . (D.101) 

n—too PePo e 

Thus, it follows from (D.101) that there exists some sequence rj n 0 such that the event 

n n (P) = {{V^ =l \P{KA R ) > Un{R) + bn\{Vi}tl) < Vn} (D.102) 

satisfies P(ff n (P) c ) = o(l) uniformly in P £ Pq. Hence, for any HRwe obtain that 

P(Un(R) < € Hn(P)} 

< P(Un(R) < t and U^p(R) < U n {R) + M{^}f=i) + Vn 

< P(Uf lP {R) < t + b n ) + r) n , (D.103) 

where the final inequality exploited that U* P (R) is independent of {V)}™ =1 . Next, define 

q n ,i- a (P) = inf{u : P{U n , P (R) <u)>l-a) (D.104) 

and note that by evaluating (D.103) at t = Cn,i-a we obtain that £l n (P) implies c U) i_ a + 
b n > Qn,i-a-r]n(P )■ Therefore, P(Q n (P) c ) = o(l) uniformly in P £ Po yields 

liminf inf P(q n i- a - rin (P) < c n ,i- a (P) +b n ) > liminf inf P({Hj}" =1 € D n (P)) = 1 . 
n—¥oo PePo n—>oo PePo 

(D.105) 
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Furthermore, arguing as in (D.103) it follows that for some sequence fj n = o(l) we have 


(P) < q^i-a-^iP) + b n . (D.106) 

Thus, exploiting (D.105), (D.106), condition (D.97), and b n = o(a n ), we conclude that 

limsup sup P(I n (R) > Cn t i- a ) < limsup sup P(I n (R) > q njl - a - Vn (P) - b n ) 
n-> oo PePo IH oo PePo 

< limsup sup P(I n (R) > c n! i- a -fj n {P) - 2 b n ) = 1- a . (D.107) 
n—>oo PePo 

The proof of the second claim follows arguments similar to those already employed 
and hence we keep the exposition more concise. Moreover, we further note that since 
the first part of the Lemma implies (D.97) holds, it suffices to show that 

liminf inf P(I n (R) > c n i_ a ) > a . (D.108) 

ra—*■ oo PePo 

First, note that we may now set the sequence b n so that b n = o{a n ) and in addition 

In(R) = U n ,p(R ) + o p (b n ) U n (R) = U*p(R) + o p (b n ) (D.109) 

uniformly in P G Pq. Moreover, arguing as in (D.101) implies that P(\U n (R) — 
U* P (R )| > b n \{Vi}^ =1 ) = o p (rj n ) uniformly in P E Po for some rj n | 0, and therefore 

liminf inf P(c n ,±- a < q n ,i- a+Vri (P) + b n ) = 1 (D.110) 

IH OO PePo 

by Lemma 11 in Chernozhukov et al. (2013). Furthermore, by analogous arguments and 
again relying on Lemma 11 in Chernozhukov et al. (2013) we can also conclude that 

Qn,l—a+iin {P) — Cn,l—a-\-fj n (- P) + b n (D.lll) 

for some fj n = o(l). Therefore, combining results (D.110) and (D.lll) we obtain 


liminf inf P(I n (R) > c ni i_ Q 

iihoo PePo 


) > liminf inf P(I n (R) > q nl - a+Vn (P) + b n ) 
n^foa PePo 


> liminf inf P(I n (R) > c n i- a+ p n (P) + 2b n ) = 1 - a , (D.112) 

n^fOO PePo 


where the final equality follows from condition (D.97). ■ 


Appendix E - Local Parameter Space 


In this Appendix, we develop analytical results analyzing the approximation rate of 
the local parameter spaces. The main result of this Appendix is Theorem E.l, which 
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plays an instrumental role in the proof of the results of Section 6. 


Theorem E.l. Let Assumptions 2.1(i), 2.2(i), 6.2, 6.3, and 6.4 hold, {£„, <5 n , r n }^ 1 
satisfy £ n | 0, 5 n l{Kf > 0} / 0, r n > ( M g 5 n + K g S/) V 2{£ n + 5 n )l{K g > 0}, and define 

G n {6) = {—j= G B„ : T G (0 + ^)< (T G (0) - K g r n ||^|| B 1 G ) V (-r„l G )} (E.l) 

vn vn vn 


A n (0 ) = { 


h GB n :^=eG n (0), T f (0+ -^) = 0 ond||4=l| B <M 

n \ n Jn \ n 


(E.2) 


T n (0) = {^=e B n :T F (0 + 4=) = O, T g (0 + -^)<O and ||-^=|| B < 2£ n } (E.3) 
vn vn vn vn 


Then it follows that there exists a M < 00, e > 0, and no < 00 such that for all n > no, 
P G P, do G ©o n(P) © R, and 6\ G (©o n(P) © R) e satisfying ||0q — #i|| B £ h n we have 

sup inf II-^L - -^=||b < M x £ n (£ n + <5 n )l{A/ > 0} . (E.4) 

AM-ms.) >«■„(«.) VS vs 


Proof: Throughout, let e be such that Assumptions 6.2 and 6.3 hold, set e = e/2, and 
for any <5 > 0 let N n ,p{ 5 ) = {9 G B n : h{{ 9 }, @0n(P) n A, || • ||b) < e}. For ease of 

exposition, we next break up the proof intro four distinct steps. 

Step 1: (Decompose h/y/n). For any P G P, Qq G @0 n(P) © R , and h G B n set 

/i ±0 o = vr F (9 0 )-Vr F (6o)[h} h Ms 0 = h - h ±e 0 , (E.5) 

where recall VX F (#o) _ : F n —> B n denotes the right inverse of VY F ($o) : B„ —>• F n . 
Further note that /r v " 0 o g AA(VY F (#o)) since VY F (0o)VY F ((?o) _ = I implies that 


VX f (0 o )[/A] = VT F (0 o )[/t] - VT f (0 o )VT f (0o)-VT f (0 o )[/i] = 0 , 


(E.6) 


by definition of h^ 0 0 in (E.5). Next, observe that if 0\ G (©on(-P)© R) € and h/y/n G B re 
satisfies ||^/\/n||B < £ n and Y F (0i + h/y/n) = 0, then 9\ + h/y/n G N n} p(e) for n 
sufficiently large, and hence by Assumption 6.3(i) and Y f ($i) = 0 due to 6\ G 0 n © R 


||VT F (0!)[—= ]|| F = ||T f (0! + — )-T f (0 1 )-VT f ( 0 1 )[—]|| f < Kf\\ —||| . (E.7) 


n 


n 


Therefore, Assumption 6.3(h), result (E.7), |0 q — #i|j B < 6 n , and ||/i/\/n|| B < £ n imply 

||VT f (9„)[-G||| f 


n 


< ||VT F (0 o ) — VT F (0i)|| o ||-^=|| B + AT/||-^||b < Kf£ n (S n + £ n ) . (E.8) 

y/n y/n 
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Moreover, since VY F (0o) : F n —> B n satisfies Assumption 6.3(iv), we also have that 


K f \\h ±e o\\ B = K f \\VT F (9 0 )-VT(9 0 )[h}\\ B 

< A7||VT f (0 o )-||o||VT f (0o)W||f < M f \\\/T F (9 0 )[h]\\ F . (E.9) 


Further note that if Kf = 0, then (E.5) and (E.8) imply that h ±0 o = 0. Thus, combining 
results (E.8) and (E.9) to handle the case Kf > 0 we conclude that for any P E P, 
9q G ©o n(P) © R, 9i E ( 0 O n {P) © R) e satisfying ||0 O - 0i|| B < 4 and any h/y/n E B„ 
such that Tp(9i + h/y/n ) = 0 and ||/i/v^||b < 4 we have the norm bound 



<M f e n (s n + e n )i{K f > o}. 


(E.10) 


Step 2: (Inequality Constraints). In what follows, it is convenient to define the set 

Sn{9 o,0i) = {“ 7 = € B ra : X G (0 O + -£=) < 0, T F (4 + ^=) = 0, and ||-^|| B < 4} ■ 
V n v n v n v n 

(E.ll) 

Then note r n > ( M g 5 n + K g 5/) V 2(4 + 4)l{-^4 > 0} and Lemma E.2 imply that 

A n (9 1 )CS n (9 0 ,9 1 ) (E.12) 

for n sufficiently large, all P E P, 9$ E ©on(F) n A, and 0i E (@on(P) n i?) e satisfying 
110o — 0 i||b < 4- The proof will proceed by verifying (E.4) holds with S n (9o, 0i) in place 
of A n (9\). In particular, if T F : B —* F is linear, then T f (9q) = Tp(0i) and (E.12) 
implies A n {9\) C S n (9o, 9\) C T n (9o), which establishes (E.4) for the case Kf = 0. 

For the rest of the proof we therefore assume Kf > 0. We further note that Lemma 
E.3 implies that for any r) n 4, 0, there is a sufficiently large n and constant 1 < C < oo 
(independent of rj n ) such that for all P E P and 9q E ©on(-P) n R there exists a 
4o, n/y/n G B n n A7(VY F (0o)) such that for any h/y/n E B„ for which there exists a 
h/y/n E S n (0o,0i) satisfying \\(h — h)/y/n || B < r/ n the following inequalities hold 

T C (9„ + 4? + 4) S 0 I|4?IIb<^„. (E.13) 

v n v n y/n 


Step 3: (Equality Constraints). The results in this step allow us to address the chal¬ 
lenge that h/y/n E S n (0o,0i) satisfies Tp(9i + h/y/n) = 0 but not necessarily T f (9q + 
h/y/n) = 0. To this end, let 7 £(VYf(0o) _ VY f (9q)) denote the range of the operator 
VT f (0 o )-VT f ( 0 o ) : B n -> B n and define the vector subspaces 

B n e ° = B„ n AA(VT f (0o)) B> = A(VT f (0o)"VT f (0o)) , (E.14) 
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which note are closed due to B n being finite dimensional by Assumption 3.2(iii). More¬ 
over, since h ^ e o E B n 9 ° by (E.6), the definitions in (E.5) and (E.14) imply that 
B„ = B^ 90 + Bn 9 °. Furthermore, since VT f($o)VT f(9q)~ = 7, we also have 


VT f (0o)"VT[>(0o)[/i] = h (E.15) 

for any h E B^ 9 °, and thus that B^ 9 ° n B^ 9 ° = {0}. Since B„ = + B^ 9 °, it then 

follows that B n = B^ 90 ©B^ 9 ° - i.e. the decomposition in (E.5) is unique. Moreover, we 
observe that B^ 9 ° n B„ 9 ° = {0} further implies the restricted map VT F (<?o) : B^ 9() -» 
F n is in fact bijective, and by (E.15) its inverse is VTf{9q)~ : F n — > B^ 9 °. 

We next note Assumption 6.3(i) implies that for all n and P E P, is Frechet 
differentiable at all 9 E B n such that \\9 — #o||b < e for some 9o E @on(F)ni?. Therefore, 
applying Lemma E.5 with Ai = B^ 90 , A 2 = B^ 9 ° and Kq = Kf V Mf V Mf/Kf 
yields that for any P E P, 9q E ©0 n(P) © R and h^ e 0 E B^ 0 satisfying \\h^°o || B < 
{e/2 A (2A/))~ 2 A l} 2 , there exists a h*(h^ e 0 ) E B,i 9 ° such that 

T f ( 9 0 + h Me 0 + h*{h Ne 0 )) = 0 11/i*( h^ e o )11b < 2K^\\h Me o\\^ . (E.16) 


In addition, note that for any P E P, 9o E ©0 n{P) © R, 9\ E (©0 n(P) © R) e and any 
h/s/n E B„ such that T p{9\ + h/y/n) = 0 and ||/j/\A^IIb < 7 n , result (E.10), the 
decomposition in (E.5), and 5 n 0 (since Kf > 0), i n {. 0 imply that for n large 





B < 27 n . 


(E.17) 


Thus, for he 0n E B^ 9 ° as in (E.13), C > 1, and results (E.16) and (E.17) imply that 
for n sufficiently large we must have for all P E P, 9q E ©o n(P) © R, #1 E @ n © R with 
||#o — $1 11b < and h/y/n E B n satisfying Tf{9\ + h/y/n) = 0 that 




T f (0 o + 

|| h*( le °’ n 


h Ne 0 


+ 

n \/n 
he hMe ‘ 


+ h*( 


+ 


0 


l 6 0 ,n 

Tl 

2/~<2 / 1,2 


h Me 0 


n 


)) = 0 


n 


n 


- KSK/C/l/ + r/7) < 0 


(E.18) 

(E.19) 


Step 4: (Build Approximation). In order to exploit Steps 2 and 3, we now set rj n to 

rj n = 32 (M f + C 2 A' 2 )4(4 + S n ) . (E.20) 


In addition, for any P E P, 9q E ©o n(P) © R, 9\ E 0 n © R satisfying ||#o — #i||b 
and any h/y/n E S n (9o,9i), we let h^ 6 0 be as in (E.5) and define 

h _ Vn + + + h* e o 

y/n y/n y/n y/n y/n ' 


< <5, 


(E.21) 
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From Steps 2 and 3 it then follows that for n sufficiently large (independent of P G P, 
Oo G 0o n(P) H R, 6\ G @ n Pi R with || 6*o — #i||b < d n or h/y/n G S n (0o,8i)) we have 

Tf(#o + —j=) = 0 . (E.22) 

Jn 


Moreover, from results (E.19) and (E.20) we also obtain that for n sufficiently large 

\\h *+ ^)IIB < WC 2 K 2 (£ 2 n + n 2 n )< r ^ + 16C 2 K 2 r, 2 n < ^r, n . (E.23) 


Thus, h = h^ e o +h ±e o, (E.10), (E.20), (E.21) and (E.23) imply \\(h — h — he 0 ^ n ) / y/n\\B < 
r} n for n sufficiently large, and exploiting (E.13) with h = (h — hg 0!n )/y/n) yields 


T g (0 o + 



(E.24) 


Furthermore, since \\hg 0:Jl / y/n\\B < Cr] n by (E.13), results (E.10), (E.19), and ||/i/\/n||B < 
l n for any hj\fn G S n (9o, 8\) imply by (E.20) and £ n i 0, 8 n f 0 that 


h 


n 


B < 


h&o,r 


n 


+ 11** 


t ^9o,n 


+ 


h^ e o 


n 


in 

7^2 / ffl 


h ±B o h 

B + ||—— B + || —^ || B 
n Jn 


< Cr, n + 16C 2 K 2 (£ 2 n + ni) + M f £ n (S n + £ n ) + l n < 2£ n (E.25) 


for n sufficiently large. Therefore, we conclude from (E.22), (E.24), and (E.25) that 
h/y/n G T n (9o). Similarly, (E.10), (E.13), (E.19), and (E.20) yield for some M < oo 


n 


n 


|b < 


+ Qo,n | 


Ib + ii v(hM! + 

Jn 


n 


h^ e o 


n 


B + 


, h ±e o 


n 


< C V n + 16 C 2 K 2 {£ 2 n + r, 2 n ) + M f t n (£ n + 5 n ) < M£ n {£ n + 5 n ) , (E.26) 


which establishes the (E.4) for the case Kf > 0. ■ 

Lemma E.l. Let Assumptions 2.1(i), 2.2(i), 6.3 hold, {£ n ,5n}^ =l be given and define 

L n {8 ,£) = {—j= G B„ : T F (0 + 4=) = 0 and \\-^=\\b < 1} (E.27) 

v n V n v n 

N n (6, £) = = G B„ : VT f (0)[4=1 = 0 and ||-^|| B < £} . (E.28) 

Jn Jn Jn 

If £ n I 0, S n l{Kf > 0} f 0, then there are M < oo, no < oo, and e > 0 such that for all 
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n > no, P G P, 9q G ©on(-P) H R and 9\ G (©on(-P) © -R) e with ||0i — #o||b < d n , we have 

sup inf ||-^=- ||b < M x £ n (4i + 5n)l{-f^/ > 0} (E.29) 

^eiV„(0 o ,2€ n ) V n V™ 

sup inf ||-^=-/=I!b < M x £„(.£„ + <5 n )l{A/- > 0} . (E.30) 

-±€N n {e 0 ,tn) ^GL„(0i,2£„) V™ V™ 


PROOF: The proof exploits manipulations similar to those employed in Theorem E.l. 
First, let e be such that Assumption 6.3 holds, set e = e/2 and note that for any 
9\ 6 (& 0 n(P) nR) £ and HVv^IIb < we have "<^//({0i + h/V™}> © 0 n(-P)©-R, || ■ || B ) < e 
for n sufficiently large. In particular, if Kf = 0, then Assumptions 6.3(i)-(ii) yield 

T f ( 0! + -^) = VT F (0x)[A] = VT F (0 o )[^] = T f (0 o + 4=) • (E.31) 

V n v n v n yn 

Thus, L n (9\. £) = L n (9o,£) = N n {9\,£) = N n (9o,£) and hence both (E.29) and (E.30) 
automatically hold. In what follows, we therefore assume Kf > 0. 

Next, for each h G B„, P G P, and 9 G (Bon(-P) © i?) e we decompose h according to 
h ±f > = Vr F (9)-Vr F (0)[h\ h Ne = h- h ±e , (E.32) 


where recall VT F (9) : F n — > B„ denotes the right inverse of VT F (9) : B n — > F n . 

Next, note that result (E.10) implies that for n sufficiently large, we have for any P G P, 
#o e ©o n{P) n R, 9i G (&on(P) n R) e satisfying ||0 O - #i||b < S n and h/y/n G L n (Qi,£ n ) 


, h ±e o 


n 


<M f £ n (£ n + 5 n )l{K f >0} . 


(E.33) 


Furthermore, note that for any h/y/n G L n (9i,£ n ) and n sufficiently large fv^°o / y/ri 
satisfies \ / T F (9o)[h^ e o] = 0 by (E.6) and \\h^ e o / y/n\\ B < 2£ n by (E.17), and thus 
jyfn g N n (9o,2£ n ). In particular, it follows that for any P G P, 9$ G ©o n (P) © R , 
and 9\ G (0q n (P) © R) e with ||0 q — #i||b © 5 n we must have that 


n „ h h .. „ h h N ° o„ 

sup inf ||—- /=||b < sup ||-^=- ^||b 

^eN n (6 0 ,2e n ) V n V n V n V n 

h ±e o 

sup ||-^||b < Mf£ n (£ n + 6 n )l{K f > 0} , (E.34) 

^eL n (e iA) 

where the first inequality follows from h^ B o /y/n G N n (9o, 2£ n ), the equality from (E.32), 
and the second inequality is implied by (E.33). Thus, (E.29) follows. 
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In order to establish (E.30) when Kf > 0 note that for any h/y/n G N n (6o,£ n ), 
VT F(do)[h/y/n\ = 0, ||0 O ~ 0i11 b < 4, and Assumption 6.3(h) imply that 

IIVTf^OiAiIIf = ||VTf(0i)[^] - VT f (0 O )[4=]||f < K f 8 n £ n . (E.35) 

V Tl -v/ Tl v Tl 


Therefore, from definition (E.32), Assumption 6.3(iv) and result (E.35) we can conclude 


h^~^l h 

1 '| B = ||VTf(0i)-VTf(0i)[-u=]||b < M f S n e n 


n 


(E.36) 


Moreover, identical arguments to those employed in establishing (E.16) (with 9± in place 
of 0 O ) imply that for sufficiently large n it follows that for all P G P, 9q G ©o n(P) hi R, 
and h/y/n G N n (6o ,£ n ) there is a h*(h /^ 6 i /-^/n) such that for some A'o < oo 


/i^i h^ 6 1 

Tf(0i + ^ + 4( —)) = 0 
'n -v/n 



B < 2A"q 



|| 2 

Mb • 


(E.37) 


Since £ n ,fi n I 0, it follows that Mfd n £ n + 2K^(£ n + Mf8 n £ n ) 2 < 4 for n sufficiently 
large. Therefore, Wh^ 6 ^ / y/n\\& < l n + ||/i“ Le i/ v^IIb, (E.36) and (E.37) imply that for n 
sufficiently large, we have for all P 6 P, 0o 6 ©o n(P) © R and h/y/n G N n (9o,£ n ) that 

h/^ s i h^ 1 

^ + h*(^-=r) G L n (0 l5 24) • (E.38) 

yn yn 

Hence, for n sufficiently large we can conclude from result (E.38) that for all P G P, 
0o £ ©o n(P) © R , and 0i G © n n A with ||0 q — 0i||b < fin we have that 


sup 

^eiV„(0 o A) 


inf 




sup 

^eN n (e 0 ,e n ) 



{V+^(V)}|| 


n 


n 


< sup {|| — _||b + ||4(—^)||b} < A/ / 44 + 2A 0 2 (4 + A7/44) 2 (E.39) 

4eAr n (0oA) v n V n 


where the final inequality holds by (E.36), (E.37) and 11 1 1B < 4 + Mffi n £ n . 
Thus, (E.30) follows from (E.39), which establishes the claim of the Lemma. ■ 

Lemma E.2. Let Assumptions 2.1(i), 2.2(i), and 6.2 hold, and 4 I 0 he given. Then, 
there exist no < oo and e > 0 such that for all n > no, P G P. 0o G ©o n{P) © R, and 
0i G B n satisfying ~cl h{{® i}, @ 0 n(P) © R, || • ||b) < e it follows that 


{—/= G B n : Tq(9i + —=) < (Tg(0i) — K g r\\ — = ||b1g) v (—'Hg), and |[— =||b < 4} 

yjn y/n yjn yn 

C G B„ : T g (0 o + ~^=) < 0 and ||-^=|| B < 4} ■ (E.40) 

y/n y/n y/n 

for any r > {M g ||0 O - 0i|| B + AT 3 ||0 o - 0i||§} V 2{4 + ||0 O - 0 1 || B }1{A 3 > 0}. 
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PROOF: Let e > 0 be such that Assumption 6.2 holds, set e = e/2, and for notational 
simplicity let N n)P (5 ) = {9 E B n : ~cl H ({0}, &o n {P) n R, || • 11b) < for any 6 > 0. 
Then note that for any 0\ E N n ^p(e) and ||/h./v^llB < 4 n we have 9\ + h/y/n E N nt p(e) 
for n sufficiently large. Therefore, by Assumption 6.2(ii) we obtain that 

||Tg(0! + A) _ T G (0i) - VT G ( 4 )[A]|| G < Kg || A||| . (E.41) 

v n V n V n 

Similarly, Assumption 6.2(ii) implies that if 9q G ©o n(P) © R and 0 1 G N n ^p(e), then 

I|vt g («„)[-L| _ vToteoiAjiic 

< ||VT G («„) - VT g (9,)U-L|Ib < Kg ||9„ - «i||b||4=||b (E.42) 

V n y n 

for any h/y/n G B n . Hence, since Tg(^o) < 0 due to $o £ ©n © i? we can conclude that 

T G (0 o + A) + {Tg(0i) - Tg(0! + A)} 

V n V n 

< {T G ( 0 q + A) - T G (0o)} + {T G ( 0 !) - T G (@! + A)} 

V n 

< -K'ffll— 7 =||b{ 2 ||—-;=||b + ||0o - #i||b}1g , (E.43) 

Y n v ri 

by (E.41), (E.42), and Lemma E.4. Also note for any 9 q E 0on(-P)ni?, 4 E N n> p(e) and 
h/y/n E B ra with ||/i/\A4||b < 4 we have 9o +h/y/n E N Ut p(i) and 9\ +h/y/n E N nt p(e) 
for n sufficiently large. Therefore, Assumptions 6.2(i), 6.2(iii), and Lemma E.4 yield 

Y G (0 O + — 1 =) - T g (9i + —j=) < VT G (<9 0 + —i=)[do - 9{\ + K g \\6 0 - 4|| b 1g 

yjn y/n yjn 

<{M s ||0o-0i||b + ^||0o-0i||^}1g • (E.44) 

Hence, (E.43) and (E.44) yield for r > {M 9 ||0 O - 6>i||b + K g \\9 0 - 4||b} V 2{4 + ||0 O - 
9/\\^}l{K g > 0}, 6» 0 E 0 O n(B) 0 R, 0i E N nt p(e), \\h/y/h\\ B < 4, and n large 

Y g ( 0o + -w=) < T g (4 + —) + (A' g r|| — || B - T g (0i))1g A rl G 

y/n y/n M V n 

= T g (4 + A) - (T g (4) - Kgr II A||b)1g V (—rl G ) (E.45) 

v n v n 

where the equality follows from (—a) V (—b) = —(a A b) by Theorem 8.6 in Aliprantis 
and Border (2006). Thus, since ai < 02 and 61 < 62 implies a\ A b± < 02 A 62 in G by 
Corollary 8.7 in Aliprantis and Border (2006), (E.45) implies that for n sufficiently large 
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and any 9q G ©o n{P) © A, $1 £ N Ut p(e) and h/y/n G B n satisfying H/i/v^-Hb < £ n and 
T G (0i + -/=) < (T g (0i) - K g r\\— ^=||b1g) V (—rl G ) (E.46) 

we must have Y G ($o + h/y/n) < 0, which verifies (E.40) indeed holds. ■ 

Lemma E.3. If Assumptions 2.1(i), 2.2(i), 6.2, 6 . 4 (H) hold, and r) n / 0, £ n f 0, then 
there is a no (depending on r\ n ,i n ) and a C < 00 (independent of r\ n ,l n ) such that for 
all n > no, P G P, and #0 G ©on(-P) n R there is hg 0tU /y/n G B„ n Af (VT p(9q)) with 

T C (9 0 + + A) < o ||^?|| B <C% (E.4T) 

V n v n v n 

/or all h/y/n G B n /or which there is a h/y/n G B n satisfying || h/y/n — h/y/n ||b < Vn, 
\\h/ \/n||B < in and the inequality Y G ($o + h/y/n) < 0. 

PROOF: By Assumption 6.4(h) there are e > 0 and Ka < 00 such that for every PgP, 
n, and 9 0 G ©on(-P) © R there exists a hg Q ^ n G B n n N(f\7T f{9/)) satisfying 

T g ( 0 o ) + VT G (0 o ) [/%,„] < —el G (E.48) 

and 11 hg 0 ,n11B < Kd- Moreover, for any h/y/n G B n such that ||h/\/n||B < in, Assump¬ 
tion 6.2(i), Lemma E.4 and Kgl'/ < M g l n for n sufficiently large yield 

t g(0° + -J=) < Torn + VT g (6»o)[-^=] + K g \\-^=\\^1 G 

< T g ( 0 o ) + {||VT g ( 6» 0 )|| o 4 + K g f n }l G < T g ( 0 o ) + 2M g £ n l G . (E.49) 

Hence, (E.48) and (E.49) imply for any h/y/n G B ra with ||/i/ v^-IIb < in we must have 
T g (0 o + 4=) + VT G (9 0 )[he o ,n) < {2 Mgln - e}l G • (E.50) 

y/n 

Next, we let Cq > 8M g /e and aim to show (E.47) holds with C = CqK4 by setting 

^ = C 0Vn h eo , n . (E.51) 

y/n 

To this end, we first note that if 9q G @ 0 n(P) © A, h/y/n G B n satisfies ||/i/- v/^IIb < in 
and T g (0o + h/y/n) < 0 , and h/y/n G B„ is such that \\h/y/n — h/y/n ||b < ry n , then 
dehnition (E.51) implies that \\9o + (hg 0jn +h)/y/n— 0q||b = o(l). Therefore, Assumption 
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6.2(i), Lemma E.4, and \\(h — h)/y/n ||b < together allow us to conclude 

T C («„ + A? + A) 

V n V n 

< t g ( 9 „ + A) + V T G (e„ + Aj[ A? + AA] + 2 A- 3 (|| Afg + nl)i a 

y/n y/n y/n y/n y/n 

< Tg(6* 0 + A) + VT g (0 o + A)[^Ai] + {2KgW^Wl + 2M gVn }l G , (E.52) 

y/n y/n y/n y/n 

where the final result follows from Assumption 6.2(iii) and 2K g r^ l < M g r] n for n suffi¬ 
ciently large. Similarly, Assumption 6.2(h) and Lemma E.4 yield 

vr G (# 0 + A)[A?] < vt g ( 0 o )[^] + ||vt g ((9 0 + h r )~ vt g (0 o )|| o ||^UbIg 

y/n y/n y/n y/n y/n 

< VT G (e 0 )[^]+K g e n \\^\\ B l G . (E.53) 

y/n y/n 

Hence, combining results (E.52) and (E.53), ||/i 0 Oi n/\/n||B < CoK^rju due to ||/i6>o,n||B < 
Kd , and p n f 0, I n I 0, we obtain that for n sufficiently large we have 

Y g (#o + ^ + /=) < T g (0 o + 4=) + VT g ( 0 o )[^] + 4M 5 T, n l G • (E.54) 

\/n V n V n 

In addition, since Cog n j. 0, we have Cor] n < 1 eventually, and hence Y G (#o + h/\/n) — 0, 
2 M g t n < e/2 for n sufficiently large due to l n f 0 and result (E.50) imply that 

t g ( 0 o + /=) + c oVn vr G (e 0 )lh 9o ,n} 

y/n 

< CoMTcido + A) + Vr G (6 0 )[h d0tn }} < C oVn {2M g l n - e}l G < -AMl G . 

(E.55) 

Thus, we can conclude from results (E.51),(E.54) and (E.55), and Co > 8 M g /e that 

T g (0 o + + A) < {4 Mg - ^}r? n l G < 0 , (E.56) 

y/n y/n 2 

for n sufficiently large, which establishes the claim of the Lemma. ■ 

Lemma E.4. If A is an AM space with norm || ■ ||a and unit 1a> and oi, <22 E A, then 
it follows that a\ < a2 + C 1a for any ai, 02 E A satisfying ||ai — g^Ha < C. 

PROOF: Since A is an AM space with unit 1 a we have that ||ai — 02 ||a A C implies 
\a\ — R 2 I A C1a, and hence the claim follows trivially from a\ — <22 < |ai — 02 !- ■ 

Lemma E.5. Let A and C be Banach spaces with norms || ■ ||a and || ■ ||c, A = Ai©A 2 
and F : A —> C. Suppose F(ao) = 0 and that there are eo > 0 and Ko < 00 such that: 
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(i) F : A —> C is Frechet differentiable at all a G B eo (ao) = {a G A : ||a —oo||a < eo}- 

(ii) || F(a + h) - F{a) - VA(a)[/i]|| c < Ar 0 ||/t||i f or all a,a + h e B eo (a 0 ). 

(Hi) ||VF(oi) - VA(a 2 )|| G < K 0 \\ai - a 2 || A far all ai,a 2 G B €0 (a 0 ). 

(iv) VF(a 0 ) : A -A C has ||VA(a 0 )|| o < K 0 . 

(v) Vi ? (ao) : A 2 —C is bijective and ||VF(ao)^ 1 || 0 < Ko- 

Then, for all h\ G Ai with ||/ii||a < {y A (4 A’q) _ 1 A l} 2 there is a unique h\(h\) G A 2 
with F(ao + hi + h^ifif)) = 0. In addition, h^hf) satisfies ||/i*(/ii)||a < 4A’q||/ii||a for 
arbitrary Ai, and ||/t*(/ii)|| a < 2A"q||/ii||^ when Ai = J\f(V F(oq)) . 

PROOF: We closely follow the arguments in the proof of Theorems 4.B in Zeidler (1985). 
First, we define g : A\ x A 2 —>• C pointwise for any hi G Ai and h 2 G A 2 by 

g{hi, h 2 ) = VF(a 0 )[/i 2 ] - F(a 0 + hi + h 2 ) . (E.57) 

Since VA(ao) : A 2 —>■ C is bijective by hypothesis, A(ao + hi + h 2 ) = 0 if and only if 

h 2 = VF(a 0 )- 1 [g(hi,h 2 )\ . (E.58) 

Letting Tj n : A 2 -A A 2 be given by Tf ll (h 2 ) = VF(ao)~ 1 [g(hi, h 2 )\, we see from (E.58) 
that the desired h^ihi) must be a fixed point of T hl . Next, define the set 

Mo = {h 2 G A 2 : ||/i 2 ||a < <S 0 } (E.59) 

for <5o = y A (4A'g) _1 A 1, and consider an arbitrary h\ G Ai with ||/ii ||a < <^o- Notice 
that then ao+hi+h 2 G B eo (ao) for any h 2 G Mo and hence g is differentiable with respect 
to h 2 with derivative V 2 g(hi, h 2 ) = VA(ao) — VA(ao + hi + h 2 ). Thus, if h 2 , h 2 G Mo, 
then Proposition 7.3.2 in Luenberger (1969) implies that 

\\g(hi,h 2 ) - g(hi,h 2 )\\c < sup \\V 2 g(hi,h 2 + r(h 2 - h 2 ))\\ 0 \\h 2 - h 2 || A 

0<T<1 

= sup W'VF(ao) ~'VF(a 0 + hi +h 2 +r(h 2 - h 2 ))\\o\\h 2 - h 2 \\ A 

0<T < 1 

< ^w||h 2 - ^ 2 || A , (E.60) 

2Ao 

where the final inequality follows by Condition (iii) and 5q < do < (4 A'q) _ 1 . Moreover, 

||VA(a 0 )[M-VA(a 0 + /i 1 )N||c 

< ||VF(ao)-VF(ao + Mllo||^ 2 ||A<^o||/ii||A||/i 2 ||A < (E.61) 
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by Condition (iv) and ||/ii||a < So < (4Xg) . Similarly, for any h 2 £ Mo we have 

||F(a 0 + h! + h 2 ) ~ F(a 0 + M - VF(a 0 + ^i)[^ 2 ]]|c < K 0 \\h 2 f A < (E.62) 

4A 0 

due to ao + h\ £ £? eo ( a o) and Condition (ii). In turn, since F(ao) = 0 by hypothesis, 
Condition (iii), ||/ii||a < and do < (4X'g) _1 yield that 

||A(a 0 +Mlc = \\F(ao+hi)—F(ao) ||c < Ao||Ml+l|VF(a 0 )||o|MlA < ■ (E.63) 

2K 0 

Hence, by (E.57) and (E.61)-(E.63) we obtain for any h 2 £ Md and h\ with ||/ii||a < dg 


\\g{hi, h 2 )\\c < 


11 ^ 2 11A < _<5o_ 

2K 0 2K 0 ~ K 0 • 


(E.64) 


Thus, since ||VF(ao) 1 ||o < Kq by Condition (v), result (E.64) implies T hl : M 0 -> 
M q , and (E.60) yields || T hl (h 2 ) - T hl (h 2 )\\ A < 2 _1 ||/i2 - h 2 \\a for any h 2 ,h 2 £ M 0 . 
By Theorem 1.1.1.A in Zeidler (1985) we then conclude T hl has a unique fixed point 
h 2 (hi) £ Mo, and the first claim of the Lemma follows from (E.57) and (E.58). 

Next, we note that since h^hi) is a fixed point of T/ ll , we can conclude that 


IIMMIIa = \\T hl (K{hi))\\ A < || ThMihx)) - T/jj( 0)|| A + ||r hl (0)|| A • (E.65) 

Thus, since (E.60) and || VF(ao) -1 1| 0 < Kq imply that \\Tfa (h 2 (hi)) — (0) || a < 

2” 1 II A)||Aj it follows from result (E.65) and 7)^(0) = —V-F(ao) _1 -E(ao + h\) that 

|||^ 2 (^i)IIa < ||r ftl (0)|| A < K 0 \\F(a 0 + M||c 

< K 0 {K 0 \\hi\\ 2 A + ||VF(ao)|| 0 ||/ i i|| A } < 2A 0 2 ||Ma , (E.66) 

where in the second inequality we exploited ||VF(ao) _1 ||o < A'o, in the third inequality 
we used (E.63) and in the final inequality we exploited ||/ii||a < 1. While the estimate 
in (E.66) applies for generic Ai, we note that if in addition Ai = A7(VA(ao)), then 

^IIMMIIa < ||r ftl (0)|| A < ^o||F(ao + Mile < K 2 o\\hi\\ 2 A , (E.67) 

due to F(ao) = 0 and VF(ao)[h\] = 0, and thus the final claim of the Lemma follows. ■ 


Appendix F - Motivating Examples Details 


In this Appendix, we revisit Examples 2.2, 2.1, 2.3, and 2.4 in order to illustrate our 
results. We focus in particular in deriving explicit expressions for the test and bootstrap 
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statistics I n (R) and U n (R), clarifying the role of the norms || • ||e, || ■ ||l; and || • ||b, as 
well as computing the rate requirements imposed by our Assumptions. 

Discussion of Example 2.1 

Since in this example we require go to be continuously differentiable to evaluate the 
Slutsky restriction, it is natural to set the Banach space B to equal B = C' 1 (R2) x 
Further recall that in this instance Z\ = {P l . Yi, Wi), X,- L = (Qi, Zi ), and 

p{X i ,6) = Q i -g{P i ,Y i )-W' il (F.l) 

for any ( 3 , 7 ) = 6 6 B. For simplicity, we assume the support of {P l . Yj) under P is 
bounded uniformly in P E P and for some Co < 00 set the parameter space 0 to be 

0 = {{g, 0)eB: \\g\\ 2j00 < C 0 and || 7 || 2 < C 0 } , (F. 2 ) 

which is compact under the norm ||#||b = IlS'llgoc V || 7 ||2 - for calculations with non¬ 
compact 0 see Examples 2.3 and 2.4 below. In order to approximate the function go we 
utilize linear sieves {pj, n }j =1 and let pti(Pi,Yi) = (pi, n (Pj, Yi),... ,Pj n , n (Pi, Yi))'. For 
T : R d ™ -7 R d ™ a bounded transformation we then set q^ n (Zi) = {T{Wi)' ,p?n{Pi, Yi)')' 
as our instruments so that k n = j n + d w . Therefore, I n {R ) is here equivalent to 

1 n 

In(R) = mf II - Wh}q^(Zi)\\ ±njr 

’ v i =1 

s.t. (i) II 7 II 2 v \\pil 1 ' /3\\ 2 ,oo < Co, (ii) pt(Po,yo)'l3 = co, 

(iii) ^pt(p,y)'P + pt(p,y)'P^ j p :> n(p,y)'P < o , (f.3) 

where constraint (i) imposes that (jj?n /9, 7 ) € 0 , restriction (ii) corresponds to T p{9) = 
0, and (iii) enforces the Slutsky constraint T q(0) < 0. 

Whenever {pj, n }j=i are chosen to be a tensor product of b-Splines or local polyno¬ 
mials it follows that sup( p y ) \\pn(p, y )\\2 ^ y/Jn and hence Assumption 3.2(i) holds with 
B n x VJn since T(Wi) was assumed bounded (Belloni et ah, 2015). Moreover, we note 
Assumption 3.3(h) holds if supp gP Ep[|| Wi \\ 2 + Qf] < 00 , while Assumption 3.3(iii) is 
satisfied with J n = 0(1) by Theorem 2.7.1 in van der Vaart and Wellner (1996). By 
Remark 4.2, it also follows that if the eigenvalues of the matrix 

Ep[<fr(Z i ){Wl, I fc(P i ,Yy)] (F.4) 

are bounded from above and away from zero uniformly in P E P, then Assumption 
4.2(h) is satisfied with u n x jU 2 and ||0 ||e = supp gP + || 7 || 2 - Since we 

expect (< 7 oi 7 o) t° be identified, we set r n = 0 and thus under the no-bias condition of 
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Assumption 5.3(ii) Theorem 4.1 yields a rate of convergence under || ■ ||e equal to 


P-n 


jny/log(jn) 

y/n 


(F.5) 


We refer to Corollary G.l for verifying Assumption 5.1 and also note that Assumption 
5.2 is satisfied with n p = 1 and K p = 2(1 + sup PgP Ep[\\ WiHl]) by (F.l) and definition 
of || ■ ||e- In turn, we note that since J~ n is an Euclidean class, we also have 


sup Jt i (P n ,J~m II • || L 2 ) < y/Jn'R-n log(n) (F. 6 ) 

PeP p 


and thus B n < y/J^ and result (F.5) imply that Assumption 5.3(i) holds provided that 
jn +1 /' log 2 (n) = o(a ri y / n). Since equation (F.l) implies that in this model 


m P (d)(Zi) = E P [Qi\Zi\ - g(Pi, Y±) - W / 7 , (F.7) 


we also observe that Assumption 5.4 holds with Vm P (0)[/i](Zj) = —g{P^Yi) — W [7 for 
any (g, 7 ) = h £ B, K m = 0, and M m = 1 + sup PgP £p[|| Wj|| 2 ]. 

With regards to the bootstrap statistic, we let (/3, 7 ) be a minimizer of (F.3) and 


1 

w n p(;9) * q k n = - W' 7 )^^) 

Wn z ' 
v i=i 

1 n 

, (F.8) 

n j=i 

where recall is an i.i.d. sample of standard normal random variables that are 

independent of the data {Qi , Pj, Y), Wi}™ =1 . Since in this model the moment conditions 
are linear in 6, in this case the numerical derivative in (63) simply reduces to 

1 n 

©n 0)[h] = — ^(KHYpPiYP + WMHZi) (F.9) 

n zJ 

2=1 

for any h = (pfc 1 /3,q) £ B„. We also note that result (F.7) similarly implies that 

O n , P {9 0 )[h} = —E P [(pP(Yi, PJP + Wh)qt(Zi)] (F.10) 

for h = (pn > f3,'y) £ B n . Moreover, since the requirement that the eigenvalues of (F.4) 
be bounded away from zero and infinity implies that similarly the eigenvalues of 




(F.ll) 
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are bounded away from zero and infinity uniformly in P £ P, it follows that ||/j||e x 
II 7 II 2 + \\Ph for any (p^ n/ /3>7) G B n . We therefore obtain from (F.9) and (F. 10 ) that 

sup j|B n (<9)[/i]-D n ,p(6 | o)Wllr 
IHIe<1 

1 U 

< • (F. 12 ) 

n z — J 

i =1 

Thus, since sup( p ^ || pft (p. y) || 2 < \/jn we conclude by standard arguments and Theorem 
6.1 in Tropp (2012) (Bernstein’s inequality for matrices) that uniformly in?£P 

sup \\B n 0)[h] -Bn,p(0o)[/i]||r = 0 P ( Jn ^ (jn) ) • (F.13) 

INIe<i v" 


We next note that given the definitions of || • ||e and || • ||b, Assumption 6.1 (i) is 
trivially satisfied with K^ = 2. Furthermore, since in this example G = C(R+), we 
obtain by defining for any {g\,0i) £ B the map VT(0i) : B —> G according to 

Odd 

VTg(#i)[/i] (p, y) = -7^g(p,y) + gi{p,y)-^g(p,y) + g(p,y)-r^gi{p,y) (F.14) 

for any (g, 7 ) = h £ B, that Assumptions 6.2(i)-(ii) hold with K g = 2. Similarly, ex¬ 
ploiting (F.14) and the definition of 0 in (F.2) it also follows that Assumption 6.2(iii) 
is satisfied with M g = 1 + 2Co. In turn, we observe that since : B —> F is lin¬ 
ear (with F = R), Assumption 6.4 is automatically satisfied, while Assumption 6.3 
holds with Kf = 0 and Mf = 1 . Sufficient conditions for Assumption 6.5 are given 
by Theorem H.l, and we note the preceding discussion implies Assumption 6 . 6 (h) 
imposes jh +1 ^' log 2 {n)l n = o(a n ) while Assumptions 6 . 6 (iii)-(iv) respectively demand 
in fog(jn) = o(nrl) and J^log(in) = o(n) because S„(B,E) < j?J 2 (Newey, 1997). 
These rate requirements are compatible with setting l n to satisfy 7£ n 5 n (B,E) = o(£ n ), 
and hence result (F.13), v n x jlJ 2 1//r , and the eigenvalues of (F.4) being bounded 
away from zero imply that the conditions of Lemma 6.1 are satisfied. Therefore, the 
bandwidth £ n is unnecessary - i.e. we may set i n = +00 - and hence U n {R) becomes 


U n (R) = inf \\W n p(-J)*q^+B n (0)[(n,pfc'P)]\\ t s.t. (i) pfr(po,yo)'P = 0, 

(’UP) 

(ii) T g (0 + ^) < (T G (0) - 2r n \&h,o 0 ) V (-r n l G ) , (F.15) 
'n \ n 


where constraints (i) and (ii) correspond to Tp(0 + h/y/n) = 0 and h/y/n £ G n {0) 
in definition (69). It is worth noting that if constraints (i) and (ii) in (F.15) are re¬ 
placed by more demanding restrictions, then the test would continue to control size. 


90 






For computational simplicity, it may hence be preferable to replace constraint (ii) with 

j n f n - 3/2 

T G {0 + < (T G (0) - 2 r -^\\(3\\ 2 ) V (-r„ 1 G ) (F.16) 

\Jn y/n 

where we have exploited that \\pt'(3\\i ,oo < jn 3 \\Ph by sup PgP \\pt'P\\ L 2 p x ||/3 || 2 and 

o /o 

5 n (B,E) < j n • Finally, we observe that in a model with endogeneity the arguments 
would remain similar, except the rate of convergence TZ n would be slower leading to 
different requirements on r n \ see Chen and Christensen (2013) for the optimal lZ n . ■ 

Discussion of Example 2.2 

In the monotonic regression discontinuity example the Banach space B was set to 
equal B = C' 1 ([—1,0]) x C' 1 ([0,1]) while ( g~,g +) = 6q G B satisfied the restriction 

E[Yi - g-(Ri){ 1 - Di ) - g+(Ri)Di\Ri, A] = 0 . ( F.17 ) 

For the parameter space 0 we may for instance set 0 to be a Co-ball in B, so that 

© = {{9h92) £ B : Hsi||i, 00 V ||< 72 1| 1,00 < Co} (F.18) 

for Co sufficiently large to ensure {g-,g+) G 0. In turn, we employ linear sieves 
{p~,j,n}j=i and {p+,j,n}j=i for C 1 ([—1,0]) and C 1 ([0,1]) respectively and set the vec¬ 
tor of instruments q^{R i: Di) = ((1 - D i )p’^ n {Ri )', Api^A)') where k n = 2 j n , 
P—,n = (p-,i,n, • • • ,p-,j n ,n)', and p+ l ,n = {p+,i,n, ■ ■ ■ ,P+,j n ,n)'- Thus, I n (R) becomes 

1 n 

I n (R) = f mf ^ " A )P , -, n {Ri)'Pl - D iP >l n {R i )'p 2 }q k n -{R i: D i )\\ t r 

(pi,P2) Y n ■ , 

1=1 

s.t. (i) p£, n (0)'Pi-i%, n (P)% = 0, (ii) > 0 , (iii) Vp£' n /3 2 > 0 (F.19) 

where constraint (i) corresponds to T p{9) = 0 , constraints (ii) and (iii) impose T q{9) < 
0, and the restriction (p J Z l ' n /3i ,G 0 can be ignored by Remark 6.3. 

For concreteness, suppose {p~,j,n}j =1 an d {P+,j,n}j=i are orthonormalized b-splines, 
in which case the constant B n of Assumption 3.2(i) satisfies B n < \fjn- Moreover, we 
note that Theorem 2.7.1 in van der Vaart and Wellner (1996) implies the sequence J n 
of Assumption 3.3(iii) satisfies J n = 0(1). In turn, provided that the eigenvalues of 

£ P [^(A,A)^(A,A)'] (F .20) 

are bounded from above and away from zero uniformly in P £ P, Remark 4.2 implies 
Assumption 4.2 holds with v n x j\J 2 1 ^ when using for any ( 51 , <? 2 ) = 0 G B the 
norm ||0 ||e = sup PgP H^iH^ + sup PgP H^Hz^. We further note that since (<?_, g + ) is 
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identified we may set r n = 0 and hence, under the no bias condition of Assumption 
5.3(h), we obtain from Theorem 4.1 a rate of convergence under || ■ ||e equal to 


A n 


jnVlog(jn) 

y/n 


(F.21) 


We conjecture the above rate is suboptimal in that it does not exploit the linearity of the 
moment condition in 9, and employing the arguments in Belloni et al. (2015) it should 
be possible to derive the refined rate 7 l n = yjj n log (j n ) / y/n at least for the case r = 2. 

Sufficient conditions for Assumption 5.1 are provided by Corollary G.l, while we 
observe Assumption 5.2 holds with n p = 1 by linearity of the moment condition and 
definition of || • ]|e- Furthermore, since T n is an Euclidean class, we further obtain 


sup J[](n n ,F n , || • || L 2 ) < y/J/jl n log(n) , (F.22) 

PeP p 


and hence under the bound B n < yfjn and H n < jnyj log (j n )/ y/n by (F.21), Assumption 
5.3(i) reduces to jn +1 ^ log 2 (n) = o(a n y/n). In turn, we note that 


m P (9)(Zi ) = E P [Yi - gi(Ri)(l - A) - 52 (A) A| A, A] (F.23) 


is linear for any (51,52) GE B, and hence Assumptions 5.4(i)-(ii) hold with K m = 0, 
while Assumption 5.4(iii) is satisfied with M m = 1 . Similarly, if we metrize the product 
topology on B = C' 1 ([—1,0]) xC^Od]) by ||6»|| B = ||5i lli.oo V H52II 1.00 for any (51,52) = 
9 £ B, then Assumption 6.1 (i) holds with A = 2, Assumptions 6.2 and 6.3 are satisfied 
with K g = Kf = 0 and M g = 1 and Mf = 2, and Assumption 6.4 holds by linearity of 
T p. Therefore, for any (51,52) = 9 G B the set G n (9 ) becomes 


Gn{9) 


( pt' n Pl ^[ n f52 

l l yfr ’ y/h 


v 9iW + AtA 


> V5i(a) A r„ Va G [-1,0] 

> V5 2 (a) A r„ VaG [0,1] 


(F.24) 

where we exploited 1 G = (lcQ-qo]), !c7([o,i])) for ica-i.o]) and lc([o,i]) respectively the 
constant functions equal to one on [—1,0] and [0,1]. 


With regards to other elements needed to construct the bootstrap statistic, we next 
let 0\. A) be a minimizer of (F.19) and for 9 = , pT+'nfh) note that 


1 U 

WnP(; 0 ) * q k F = -, = Y, “ (1 - Ay_,n(A)'/3 1 - A^JA)'^)^"(A, A) 
* 1=1 


1 n 

- - (! - AXn(A)'/5i - D jP ^ n (Rjy^)q k F(Rj, A)} • (F.25) 

i=i 
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Since the moment condition is linear in 6 , there is no need to employ a numerical 
derivative to estimate D n ,p($o) and hence following Remark 6.1 we just set 

1 n 

V>n(d)[h} = — V {(1 - + D iP >^ n (R i yp 2 }q^(R i , A) (F.26) 

n z — J 5 5 

i =1 

for any h = (p^/AP+A/^)- Analogously, in this instance B n ,p(#o) : B n —> equals 

O n , P (Oo)[h] = ~Ep[{( 1 - A)< n (A)'/?:i + D i p l ^ n (R i yp 2 }qt(R i :D i )} (F.27) 

for any h = (p^ n /3i,P+^Z^). Provided the eigenvalues of (F.20) are bounded from 
above and away from zero, and recalling v n x j n , it is then straightforward to 
show ||/i||e < Pn||®n,p(#)[^] ||r for any h E B„. Moreover, by direct calculation 


sup ||B n (0)[/t] — H> n! p(9o)[h]\\ r 
IWIb<1 


< 


1 

-^^(A,A)^(A,A)' 

n z —' 


f; p [^(A, A)^"(A, A) , ]||o.2 (F.28) 


and hence from Theorem 6.1 in Tropp (2012) we can conclude that uniformly in P E Pq 


sup ||B n ( 0 )[/l] -lD„ l p( 0 o)[/»]||r 
IW|b<1 


Op{ 


jn \/log(j?i 


n 


(F.29) 


In particular, (74) holds provided j ^ 2 1//f \/log(j n ) = o(y / n), and by Lemma 6.1 and 
Remark 6.4 it follows that the bandwidth £ n can be ignored if it is possible to set 
4 | 0 such that lZ n = o{£ n ). The additional requirements on £ n are dictated by 
Assumption 6 . 6 , which here become j]j r yJ\og(j n ) yjjn sup PgP J[ ] (£ n , F ni \\-\\ L 2 p ) = o(a n ). 
Since we have shown j}J r ydog {j n )VJn sup PgP J[](77 n , R n , || • \\l 2 p ) = o(a n ), we conclude 
lZ n = o(£ n ) is feasible, and hence in this example we may employ 


K(0,+OO) 




)eG n (e): P i ™ n (oyp 1 


p’+,n(°y & = °} • 


(F.30) 


Thus, combining (F.24), (F.25), (F.26), and (F.30), the bootstrap statistic becomes 


U n {R) = inf \\w n p(;0) * qfr +B n (e)[(p’^ n f3 1 ,p j -; 

(pi,P 2 ) 




s.t. (i) ( 


i i% n P2 


n 


n 


) E G n 0), (ii) ptjO )'& - < n (0)'/3 2 = 0 . (F.31) 


Finally, we note that Theorem H.l provides sufficient conditions for Assumption 6.5, 

o /o 

while using the bound <S n (B,E) < j n from Newey (1997) and (F.21) implies Assump¬ 
tion 6 . 6 (iii) is satisfied provided j^log(n) = o(nr^). ■ 
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Discussion of Example 2.3 

Recall that in this application the parameter 9 consists of a finite dimensional com¬ 
ponent ( 71 , 72 , 0 ) G J\ 2d ~f +da and a nonparametric function 5 6 C(R dv ). For notational 
simplicity, we let ( 71 , 72 , a) = 7 r 6 R rf?r with d^ = 2 + d a , and dehne the function 


M x {z u e) 

= J l{w’j\ + ei > 0 , W /72 + 5(F) + e 2 < 0 , IF /71 + > IF /72 + e 2 }dG(e|a) , 

which constitutes the part of (16) that depends on 6. Similarly, we further define 

M 2 (Zi,6) = J l{d < -W' 71 , e 2 < -lF/ 7 2 }dG(e|o) (F.32) 

M 3 (F, 9) = J 1{ci + 5(F) > -W/ 71 , e 2 + 5(F) > -TF/ 72 }dG(e|a) , (F.33) 

which correspond to the moment conditions in (17) and (18). For Ai the observed bundle 
purchased by agent i let Xi = (A t . Zi), and then note that the generalized residuals are 

p 1 (X i ,9) = l{A i = (l,0)}-M 1 (Z i ,9) 

P2(Xi,e) = 1 {Ai = (0,0)} - M 2 (Zi,9) 

P3(Xi, e ) = 1 {Ai = (1,1)} - M 3 (F, o) . (F.34) 

so that p(Xi,9) = (pi(Xi, 6), p 2 {Xi, 6), pz{Xi, 9))' for a total of J = 3 restrictions. 

In this instance, B = R rf,r x C(R di/ ) and for illustrative purposes we select a non¬ 
compact parameter space by setting 0 = B. For {pj,n}j=i a sequence of linear sieves in 
C(R d F and pt(y) = (pi,n{y), ■ ■ ■ ,Pj n ,n(y))' we then let 0 n be given by 

Q n = {( 7 T, 5) 6 R rf7r X C(R dM ) : ||7r ||2 < Co and 5 = pfr'P for some ||/3 || 2 < C n } (F.35) 


for some constant Co < 00 and sequence { C n }^ =1 satisfying C n t 00 . In turn, for 
1 < j < J we let {qk,n,j}k =1 denote a sequence of transformations of F = (Wj,F)j and 
recall qjp(z) = (q£? W, qfy 2 (z)', qfy 3 (z)')' where = {qi, n , 3 {z ),..., q^.n^z))' 

and k n = /cn,i + ^’n ,2 + fcn, 3 - We note, however, that since the conditioning variable is the 
same for all three moment conditions, in this instance we may in fact let q ^ n [ 1 = q™£ = 
q^ i.e. employ the same transformation of F for all three moment conditions. Thus, 
given the above specifications, the test statistic I n (R ) is equivalent to 


Xn 11 

in llE„,r 


1 

i(R) = ^ I! ^’P’n'P)) * Qn 

s-t. (i) pif/3 < 0, (ii) ||vr ||2 < C 0 , (iii) ||/3 || 2 < C n , (F.36) 
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where constraint (i) corresponds to T g{@) < 0 , while constraints (ii) and (iii) impose 
that (TTjp’n'fi) = 9 € © n . The latter two restrictions are standard sieve compactness 
conditions imposed in (even parametric) nonconvex estimation problems. 

Next, let M(Zi,9) = M 2 (Zj,$), M^(Zi,9))' which we will assume to be 

differentiable, with V 7 r M(Zj,@) denoting the derivative with respect to n, and 


V s M(Zi,9) 


d_ 

frr 


M(Zi,9 + res) 


T—0 


(F.37) 


for e <5 E B equal to (0, )) an d ^-C'(R d y) the constant function that equals one 

everywhere. For || ■ | p the Frobenius norm of a matrix, we further define 


F w (z)=sup||V w M( 3 , 0 )|| F F s (z) = sup \\V S M (z, 9)\\ 2 , (F.38) 

ee© 6 »ee 

and then observe that by the mean value theorem and the Cauchy-Schwarz inequality 

I Pi)) ~ Mj(Zi, (vr 2 ,p^ / /3 2 ))| 

< F^Zi) ||7n - vr 2 || 2 + Fs^W^iYi)'^ - /3 2 )|| 2 

< {Fn(Zi) + Ik"(y i )ll2i ? 5(^)}{||vri - tt 2 || 2 + ||/5! - /3 2 || 2 } (F.39) 

for any 1 < j < J. Defining Fg tn (z) = {F n (Zi) + \\pt(Yi) \\ 2 Fs(Zi)} and assuming that 
supp g p Ep[F 77 (Zi) 2 ] < oo, Fg(z) is bounded uniformly in z, and that the matrix 

Ep\pk(Yi)pk(Yi)'] (F.40) 


has eigenvalues bounded away from zero and infinity uniformly in n and P 6 P, it then 
follows that supp g p < yfjn- Therefore, by result (F.39) and Theorem 2.7.11 

in van der Vaart and Wellner (1996) we can conclude that 

sup N u (e,F n , || • \\ L] ,) < . (F.41) 

Hence, since J 0 “ log (M/u)du = a\og{M/a ) + a and C n VJn t °°j result (F.41) yields 

sup J[]{r],F n ,\\ ■ \\ L 2 ) < [ {1 +j n log { ° n ^ )} 1/2 de 
pgp p J o 6 

< \[Yn [ log { Cny ^ n )de = y/Yn X {rj log( ) + r]} . (F.42) 

Jo e V 

In particular, since F(Xj) = 1 is an envelope for F n , we obtain by setting rj = 1 in 
(F.42) that Assumption 3.3(iii) holds with J n x \Jj n \og{C n j n ). 

We next study the rate of convergence under the assumption that the model is 
identified and hence set r n = 0 - sufficient conditions for identification are provided 
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by Fox and Lazzati (2014). For any (7r,<5) = 0 G B, we then define the norm ||0||e = 
1111 2 + supp e p \\ 6 \\ L 2 p and note that since the eigenvalues of Ep[p^ n (Y))p^ 1 (li) , | were 
assumed to be bounded away from zero and infinity, Remark 4.2 implies that Assumption 
4.2 holds with u n x k n ~ ' provided that the smallest singular value of the matrix 


E P 


q^(Z i )*V n M(Z i ,9) y 
q^(Z i )*V s M(Z i ,e)pt(Y i y >- 


(F.43) 


is bounded away from zero uniformly in 6 G (@o n{P) FI R) e , n, and P G Po- Therefore, 
assuming \\qk.n-j\\ L ^ is uniformly bounded in k, n, j, and P G P for simplicity, we obtain 
that under Assumption 5.3(ii) the rate lZ n delivered by Theorem 4.1 becomes 


P-n 


V k n j n \og(k n ) X log (C n j n ) 
yfn 


(F.44) 


where we exploited u n x Ay / 2 1 ^ ? and that as previously argued J n x y/j^,log(C n j n ). 

Corollary G.l provides sufficient conditions for verifying Assumption 5.1, while the 
definition of M(Zi, 9), equation (F.38), the mean value theorem, and the Cauchy Schwarz 
inequality imply for any 9 1 = (tti, pi?' /?i) G 0 n and 9 2 = {'^ 2 ,Pn' P 2 ) G ©n that 


Ep[\\p(Xi, 0x) - p(X u 0 2 ) |||] = E P [\\M(Zi, 0i) - M(Z U 0 2 ) |||] 

< 6Ep[F 77 (Zi) 2 \\'Ki - 7t 2 ||§ + FsiZtfiptWih - /? 2 )) 2 ] < ||0i - 0 2 ||| , (F.45) 


where in the final inequality we exploited that supp gP Ep[F n (Zi) 2 ] < 00 and F$(z) is 
uniformly bounded by hypothesis. Hence, we conclude from (F.45) that Assumption 5.2 
holds with k p = 1, and combining (F.42) and (F.44) we obtain that 


jnkn' + l/2 log (fcn) log 2 {C n j n n) 
y/n 


o(a n ) 


(F.46) 


implies Assumption 5.3(i) holds. Unlike in Examples 2.1 and 2.2, however, p(Xi,9) 
is nonlinear in 0 and hence Assumption 5.4 is harder to verify. To this end, recall 
mpj(0) = Ep[pj(Xi,9)\Zi], and for any ( 7 t, 6 ) = h £ B define 


Vm Pj (0)N = V^M,(Z i ,0) 7 r + V s M J (Z i ,9)S(Y i ) , 


(F.47) 


which we note satisfies Assumption 5.4(iii) with M m = supp 6P H-F^llp^ + 1111 00 - Next, 
we suppose that for any 0i, 02 G (0o n{P) C R) e with 9\ = (7Ti, <5 i) and 02 = (tt 2 , 62 ) 


V^M(Z i ,0 1 ) - V n M(Z u 9 2 )\\ f < - ir 2 || 2 + G 5 ||<Si - 5 2 ||oo 

II V S M(Z U 0i) - V s M(Zi, 0 2 ) 11 2 < G s {\\m - it 2 1 |2 + ||$i - <5 2 ||oo} (F.48) 
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for some functions G n satisfying sup PgP Ep[G n (Zi) 2 ] < oo and a constant Gs < oo - a 
sufficient conditions is that M(Z{, 9) be twice continuously differentiable with respect to 
0 and that such derivatives be uniformly bounded. Exploiting results (F.47) and (F.48), 
we then obtain by the mean value theorem that for any (it, 6) = h € B 


|| mp i3 {0 + h) — mpj(9) — X mp }J (9)[h ]\\ L 2 

< {IKIb + sup ||h || L 2 } x {(G 5 + sup ||G ff || L 2 )||7r|| 2 + GjX ||(5||oo} • (F.49) 

PeP p PeP p 

Therefore, setting K m = G^+suppgp we conclude that Assumption 5.4(i) holds 

with the norm ||0 ||l = IMI 2 + Halloo for any (t 5) = 9 e B. Identical arguments as in 
(F.49) further verify Assumption 5.4(h) for the same choice of K m and || • ||l- Because 
|| • ||l in fact metrizes the product topology in B, in this example we actually have B = L 
and || • ||b = II ' 11l• Moreover, since the smallest eigenvalue of A p [pn‘ (F)) Pn(Yi)'] was 
assumed to be bounded away from zero uniformly in P 6 P, we also obtain that 

<Sji(B, E) = <S n (L, E) < sup llp f f llo ° < y/j~ n , (F.50) 

/ 3 eR> IIPII 2 

where the final inequality applies when {pj,n}j=i are Fourier, Spline, or Wavelet series 
since then sup y ||p^T(y )||2 yfjn (Belloni et al., 2015; Chen and Christensen, 2013). If 
{Pj,n}j =1 are polynomial series instead, then (F.49) holds with j n in place of yf]^. 

Now turning to the construction of the bootstrap statistic, we let (if, j3) be a mini- 
mizer of (F.36), and setting 9 = (if ,Pnp) we then define 

n 1 72 

W n p(-, 9) * = ~r E “i{P( X i’0) * (^) - - E P^ X E *) * ( Z i)} (F-51) 

v i=i j =1 

for p(Xi,9) as defined in (F.34). Since p(Xi,9) is differentiable in 9 we do not employ 
a numerical derivative, but instead follow Remark 6.1 and set for any (it, 6) = h € B n 

1 n 

B n(9)[h] = -^2{V v M(Z i ,9)ir + VsM(Z i ,9)S(Yi)}*^r(Z i ) . (F.52) 

2—1 


Next, we note that Assumption 6.1 (i) holds with K\, = 1, while linearity of Tq implies 
Assumptions 6.2(i)-(ii) hold with K g = 0, while Assumption 6.2(iii) is satisfied with 
Mg = 1 by direct calculation. In turn, since no equality restrictions are present in this 
problem, Assumptions 6.3 and 6.4 are not needed - formally they are automatically 
satisfied by setting F = R and letting T p(9) = 0 for all 9 £ B. Thus, here we have 




G B n : 


y/n 


< max{0,-pY(y)' 


- r , 


1 } for all y j . 


(F.53) 
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Hence, since K g = Kj = 0, according to Remark 6.4 we may set V n (9 ,£ n ) to equal 


V n (9,£ n ) = {- 7 = £ Bn : 4 = € G n (9 n ) and ||-^|| E < 4} • (F.54) 

V n V n V n 

Finally, we observe that Theorem H.l provides sufficient conditions for Assumption 6.5, 
while Assumption 6 . 6 (i) is automatically satisfied since r n = 0. Moreover, exploiting 
results (F.42), (F.44), and (F.50), it follows that Assumption 6 . 6 (h) reduces to 

4 X {(j n n ) 1/4 V y/jnkn log (k n ) log(^^)} = o(a n ) (F.55) 

and Assumption 6 . 6 (iii)-(iv) are satisfied whenever j n y/k n log(4) log (C n j n ) = o(y/nr n ). 
Furthermore, since the eigenvalues of (F.40) have been assumed to be bounded away 
from zero and infinity, it follows that INIe X IMb V ||/3 || 2 uniformly in (ir ,p^ n '/3) = h € 
B n and n. Therefore, from results (F.53) and (F.54), the bootstrap statistic equals 


Un{R) = hR ||W nP (;9) * 

s.t. (i) Pn ^ ^ < max{0, -pt(y)'P ~ r n } Vy, (ii) || 7 r || 2 V ||/3 || 2 < y/ni n . (F.56) 


n 


Alternatively, under slightly stronger requirements, it is possible to appeal to Lemma 
6.1 to conclude that the bandwidth 4 is unnecessary - i.e. the second constraint in 
(F.56) can be ignored. To this end, we note that for any ( 7 r,p^ n/ / 3 ) = h G B n , we have 

B. n ,p(9)[h] = E P [{V 7 r M(Z i ,9)Tr + WsM(Z u 9)pt(Y i )^}*q^{Z i )} . (F.57) 

1 _ 1 1 _ 1 

Since u n x kn r , ||a || 2 < kn r ||a|| r for any a G R fcn , and we assumed the smallest 
singular value of (F.43) and the largest eigenvalue of (F.40) are respectively bounded 
away from zero and infinity uniformly in 9 G (@o n(P) FI R) e , n, and P G Po, we obtain 

\\h\\ E <VnPn,p(m}\\r (F.58) 

for any 9 G (©on(P) F R) e , P G Po, and h G B n . In order to verify (74), we define the 
class Gs,n = {g ■ g{z) = V sM 3 (z, 9)q k , n ,j( z )pt {y)' P for some 9 G Q n , ||/3||2 < 1, 1 < 
j < j n , 1 < k < k n , and 1 < j < J}. Since sup, y ||^”(y)|| 2 < as exploited in 

(F.50), and \\qk,n,j\\L^ was assumed to be uniformly bounded, it follows from (F.38) 
that FsVJ^K 0 is an envelope for Gs,n for Ko sufficiently large. Therefore, it follows that 

Ep[ sup \& n g\]< J[]{\\\JTnF 5 KQ \\ L 2 g n , 6 A\-\\ L i) (F.59) 

g&Gs,n 
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by Theorem 2.14.2 in van der Vaart and Wellner (1996). Furthermore, exploiting once 
again that sup, y ||pn n (2/)|| 2 VJn and condition (F.48) we can conclude 

| VsMj(Zi, ejqkMiZdpfcWP! - V 5 M 3 {Zi, e 2 )q k ,nA Z i)pt(Yi )%I 

^ II^IIooIIp^'^i - /3 2 )||oo + {||tti - 7r 2 || 2 + Wpt'Uh - @ 2 ) 11 00 } WPri' @2 11 00 
^ yfaWPi - /3 2 ||2 + jn{||vri - 7T 2 || 2 + ||/?1 - P 2 W 2 } • (F.60) 

for any 0\ = Pi) and 0 2 = (vr 2 ,p^ n/ /3 2 ). Thus, result (F.60), Theorem 2.7.11 

in van der Vaart and Wellner (1996), and arguing as in (F.41) and (F.42) implies 
suppgp II ' \\i?p) = 0{j n \og(C n j n )). Similarly, if we define g n>n = 

{g : g(z ) = p J> (y)^ nj (z)V 7r M ? (z, 0)7T for some 1 < j < j n , 1 < k < k ni 1 < j < 
J , 0 £ 0 n and ||7r|| 2 < 1}, then by analogous arguments we can conclude 

sup Ep[ sup \G n g\] = 0(j n \og{C n j n )) . (F.61) 

-PeP geG*, n 

Thus, employing the above results together with Markov’s inequality finally implies that 


sup sup ||ID) n (0)[/l] -Bn,p(0)[fr]||r 
0S0„ heB„:||/i|| E =l 

y/^n r l/n I I \fT n D ^ v^njn log(C , n J n ), 

< —— X { sup \G n g\ + sup |G n g|} = O p ( - j= -) . (F.62) 

V n g&Gn,n g&Gs, n v n 

Exploiting (F.58) and (F.62), it then follows that the conditions of Lemma 6.1 are 
satisfied and constraint (ii) in (F.56) can be ignored if £ n can be chosen to simultaneously 
satisfy Assumption 6.6 and £ n = o(TZ n ). These requirements are met provided that 


Kjn 2 log(A; n ) log 2 (C n j n ) 



(F.63) 


which represents a strengthening of (F.46) - note strengthening (F.46) to (F.63) poten¬ 
tially makes Assumption 5.3(h) less tenable. Hence, under (F.63) we may set 


Un(R) = mf } ||W n p(-,0) * Cfc +BnW[(7T,^)]||^, r 


T? n (11 V Q • ^ 

s.t. " — < max{0, —p’niyYfi — r n } for all y , (F.64) 


n 


i.e. the second constraint in (F.56) may be ignored when computing the bootstrap 
critical values. ■ 


Discussion of Example 2.4 

An observation i in this example was assumed to consist of an instrument Zi 6 
and a pair of individuals j £ {1,2} for whom we observe the hospital Hij in the network 
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T~L that they were referred to, as well as for all hospitals h E H the cost of treatment 
Pij(h) and distance to hospital Dij(h). Ho and Pakes (2014) then derive 

2 

E [^2{lo(Pij(Hij) - Pij'iHij,)) + go{D ij {H ij )) - g 0 {D ij {H ij i))}\Z i ] < 0 (F.65) 

3= 1 

where 70 E R, go : R+ — > R + is an unknown monotonic function, and j' = {1,2} \ {j}. 
For notational simplicity, we let X t = ({{Pjj(/t), AyWIfreW, Hij} 2 j=ii Z%) and define 

2 

,g) = ^2{'r(P ij (H ij ) - PijiHij,)) + g(D ij (H ij )) - g^H#))} ■ (F.66) 

3 = 1 

In addition, we assume that the supports of Ay (Aj) and Aj(Aj') are contained in a 
bounded set uniformly in P € P and j 6 {1,2}, which we normalize to [0,1]. Finally, 
recall that in this example 70 is the parameter of interest, and that we aim to test 

Ho ■ 7 o = c 0 A : 7o / c 0 (F.67) 

employing the moment restriction (F.65) while imposing monotonicity of go : R+ -7 R+. 

In order to map this problem into our framework, we follow the discussion of Example 
2.4 in the main text - see equations (21)-(22) - and rewrite restriction (F.65) as 


EU’( x hlo,go)\Z i ] + A 0 (Zi) = 0 , (F.68) 

for a function Ao satisfying Ao(A) > 0. We further define the Hilbert space Lfj by 

L 2 v = {/ : [0,1] -> R : H/IA < 00 for \\ff L2 = C f\u)du} (F.69) 

u u Jo 

and note C' 1 ([0,1]) C Ljj. The parameter in this example is thus do = ( 70,501 Ao) which 
we view as an element of B = R x <A([0,1]) x 7°°(R d2 ), and set as the residual 

p(X i ,e) = MX i ,'y,g) + \(Z i ) (F .70) 

for any 9 = ( 7 , g, A). The hypothesis in (F.67) can then be tested by letting F = R 
and T p{ 6 ) =7 — 00 for any 6 E B, and imposing the monotonicity constraint on g and 
positivity restriction on A by setting G = £°°([0,1]) X ^°°(R d2 ) and T q( 6 ) = —(g',\) 
for any 8 £ B. Finally, as in Example 2.3 we utilize a noncompact parameter space and 
let 0 = B, which together with the preceding discussion verifies the general structure 
of our framework imposed in Assumptions 2.1 and 2 . 2 . 

To build the test statistic I n (R), we let {7lfc,n}fc=i denote a triangular array of 
partitions of R dz , and set q^{z) = (1 {z E A 1>n },..., l{z E A kn ^ n })'. For ease of 
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computation, it is convenient to also employ {1 {z E ^4fc,n}}fc=i as a sieve for £°°(R^) 
and thus for a sequence C n t oo we approximate l°° (R^ 2 ) by the set 

A„ e {A G ^“(R^ 2 ) : A = q^'n for some ||vr ||2 < C n } . (F.71) 

In turn, for a triangular array of orthonormal functions in Lfj such as b-splines 

or wavelets, and plr(u) = (pi, n {u),... ,Pj n , n (u)y we let the sieve for B be given by 

0 n E{( 7 , 9 ,A)GB: S =^, |||0||2 < C n , A E A n } . (F.72) 


Given the stated parameter choices, the test statistic I n (R ) is then equivalent to 


In(R) 


inf 

in,P, tt) 


-fS E p( x ‘’ (7,p^A dJ-V)) * n , r 

V • 1 
2=1 

s.t. (i) ||/3 || 2 < c n , (ii) 7 = c 0 , (iii) Vpfr'P > 0 , (iv) tt > 0 , 


(F.73) 


where restriction (i) imposes that 9 E 0 n and the requirement ||7r|| 2 < C n can be 
ignored as argued in Remark 6.3, and restrictions (ii) and (iii)-(iv) respectively enforce 
the equality (T p(9) = 0) and inequality (T g(9) < 0) constraints. While we introduce 
(F.73) due to its link to our general formulation in (26), it is actually more convenient 
to work with a profiled version of I n (R). Specifically, the choice of sieve A n enables us 
to easily profile the optimal 7 r E R fcn in (F.73) for any given choice of ( 7 , j3) leading to 

1 n 

UR) = mf ll{^= E i/’(Xi,'y,Pk'P) * Qn n (Zi)} V 0\\ tn , 

(7,/3) V n 

s.t. (i) \\/3\\ 2 < C n , (ii) 7 = c 0 , (iii) Vp^'/3 > 0 . (F.74) 


The ability to profile out the nuisance parameter A grants this problem an additional 
structure that enables us to weaken some of our assumptions. In particular, the rate 
of convergence of the minimizers of I n (R ) in (F.74) is better studied through direct 
arguments rather than a reliance on Theorem 4.1. To this end, let 

r 0n(P) nR = {( 7 , 5 ) : ( 7 , 5 , A) E Q 0 n (P) n R for some A E £°°(R d2 )} 

r n ni? = {( 7 , 5 ) : ( 7 , g, A) G 0„nR for some A G T(R 4 )} (F.75) 
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denote the profiled set ©o n(P) © R and profiled sieve 0 n n R. For each (7 ,g) G T n n R 
we then denote the corresponding population and sample profiled A by 

kn 


X^f(z) = -^2l{ze A kjn }{E P [ip(Xi, 7 , g)\Zi G A k>n ] A 0} 


(F.76) 


k =1 
k n 




k =1 


E"=i H z i £ 4,«}^(Aj,7,fl) 
^2ii=l l{Zi G ^4fc,n} 


A 0} , (F.77) 


and observe that 0on(-P)nP = {( 7 , < 7 , A^ 7 p^) : ( 7 , 5 ) G Fon(P)nP}. Therefore, defining 


1 

P n ( 7 , 5 ) = \\{- r J2^ X ^9)*^r(Zi)}yO\\ tnir , (F.78) 

* 7=1 

we can then construct a set estimator for the profiled identified set Fo n (P)nP by setting 

f„ n R = {( 7 , 5 ) G r ra n R : V n {l,g) < inf Qn(0) + r n } (F.79) 

e&e n nR 

for some r n {, 0. Next, we note that the collection of transformations {qk,n } k =1 * s 
orthogonal in L 2 p , yet not orthonormal. In order to normalize them, we suppose that 

■— x inf inf P(Z { G x sup sup P(Z { G A fe ,n) (F.80) 

K n P £P PeP l<fc<fc„ 

— 1/2 

which implies ll^nlli,®, x k n uniformly in P G P, 1 < k < k n , and n. Following the 
discussion of Examples 2.1, 2.2, and 2.3 we further impose the condition 

H{{ r y,g),Ton{P) n R, || • H 2 + || • I 

<k 1 ~^{\\E P {^(X i , 1 ,g)*q^(Z i )}y0\\ r + O(Cn)} • (F.81) 

Defining 7 , g)q k ,n( z ) '■ (7,5) G T n n P, 1 < < k n }, next suppose that 

llsll^ x E P [g 2 (D ij (H ij ))} x Ep[ 5 2 (A,(^f))] (F.82) 

uniformly in g G Lfj, j G {1,2}, and P G P. If in addition Pp[P,§(Pr/) + P4(Pjj/)] 
is bounded uniformly in P G P and j G {1,2}, then Q n has an envelope G n satisfying 
supp g p 11 G n 11p 2 ^ < C n . Arguing as in (F.42) it is then possible to show 

sup Jn(C n , Q n , || • ||p 2 ) ^ C n \J'j n log (kn) , (F.83) 

Dr D 1 
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and hence letting V n ,p{p , 5 ) = \\y/nE P [ip(Xi, 7 , 5 ) * q* n (Zi)\ V OHg^ for any (7 , 5 ) 6 
R X Lfj we obtain by Theorem 2.14.2 in van der Vaart and Wellner (1996) that 

sup \'Pn('Yi 9) - ^n,p( 7 , 5 )l = Op(k l J r C n sJjn log (k n )) (F.84) 

('r,g)&r n nR 

uniformly in P E P since ||S n || 0)T . = O p ( 1) uniformly in P E P by Lemma B.3. Under 
(F.81) and (F.84) the proof of Theorem 4.1 applies without changes, and therefore under 
the no-bias condition of Assumption 5.3(h) we obtain uniformly in P € P 

n R, r on(P) n R, || • || 2 +1| ■ U^) = + *W rn) . (F . 85) 


Moreover, it also follows from (F.82) that for any ( 71 , 51 ), ( 72 , 52 ) G T n n R we have 

sup ||A^p 9l) - A { ^p 92) \\ L 2 p < 11 51 - 92\\lI • (F.86) 


PeP 


In addition, by standard arguments - see e.g. Lemmas 2.2.9 and 2.2.10 in van der Vaart 
and Wellner (1996) - it also follows from (F.80) that uniformly in P G P we have 


1 1 ^ Ak tn } | _ „ , \Jk n log(fc n ) - 

max | — 2_ J c A y — ^ I — .p v“ 


n 


(F.87) 


lXA,X/c ?1 Tl A P(Zi G At. n 
i=1 ’ 

while under (F.83) Theorem 2.14.2 in van der Vaart and Wellner (1996) implies that 

. C n yj'jn log {kn) 


sup | - £ f( x i) - Ep\f(Xi)] I = O ( ^VJnlo g(M } 
/eS„ Vn 


(F. 88 ) 


Thus, combining (F.80), (F.87), (F. 88 ) with the definitions in (F.76) and (F.77) yields 

sup sup || A£>s) - A<™ } || L 2 = 0 p( ^ C, ^( fc nVj n )log(fc n ) ) (F g9) 

(7,s)er n nPPeP ’ p 


Hence, setting ||0|| E = II 7 II 2 + Ibll l% + sup PgP || AH^ for any ( 7 , 5 , A) = 0 G B, and 

&n n R = {( 7 , 5 , Ai™)) : ( 7 , 5 ) € f„ n R} , (F.90) 

we finally obtain from &Q n (P) H R = {( 7 , 5 , A^ 7 p ^) : ( 7 , 5 ) G Fon(F > )nii}, results (F.85), 
(F. 86 ), and (F.89) that uniformly in P G P we have that 


~f /A n n n /dn n d 11 11 \ — n /knC n y]{k n V j n ) log(/c n ) i-i 

d H\®n © R, ©0n(R) © R> || ’ He) — ^n) • (F• 

V ^ 
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1 -^ 


For the rest of the following discussion, we thus let u n = k n r and set lZ n to equal 

&nCri\/ (ky, V j n ) log(/,'„) 


Tin = 


compare to result (48) in the main text. 


n 


(F.92) 


With regards to Section 5, we refer to Corollary G.l for sufficient conditions for 
Assumption 5.1, while Assumption 5.2 is satisfied with k p = 1 and some I\ p < oo since 
|| 0 ||e = H7II2 + \\g\\ L 2 V +sup PeP IIAII^ for any (7, g, A) = 0 £ B and we assumed (F.82) 
holds. Moreover, from (F.72) and arguing as in (F.42) we calculate 


(j 

sup Jn(rj,X n , || • || L 2 ) < y/j n Vfc„(j?log( —) + 7 ) , 
Pep p V 


(F.93) 


which together with (F.92) implies that a sufficient conditions for Assumption 5.3(i) is 


kh +r {j n V kn)C n log 2 (n) 


n 


= o(a n ) . 


(F.94) 


In turn, we note the definitions of ip(Xi, 7 , g) and p{Xi,9) in (F. 66 ) and (F.70) imply 


Xmp{d)[h\ = E P [ip (Xi,'y,g)\Zi] + A (ZJ , 


(F.95) 


and hence Assumption 5.4(i)-(ii) holds with K m = 0, while Assumption 5.4(iii) is sat¬ 
isfied for some M m < 00 since we assumed that Ep[P^{Hij) + P^{Hiji)} is bounded 
uniformly in P £ P and imposed condition (F.82). 

Turning to the construction of the bootstrap statistic, we first recall that any 0 £ 
0 n n R is of the form 9 = ( 7 , g, Ai 7 ’ 3 "*) for some ( 7 , g) £ f n n R - see (F.90). Therefore, 

1 n 

W nP(;0)*q£ = ^^u Ji {^{X i , 1 ,g)+X^\Z i ))*qt{Z i ) 

V i=1 

1 U 

~ (- E 7, g) * q k n n (Zj)) V 0 } (F.96) 
n i=i 

for any {j,g, A) = 9 £ 0 n n R. Next, also note that for any 9 £ B and h = ( 7 , 5 , A), 
definitions (F.66) and (F.70) imply that the estimator O n {9)[h] (as in (63)) is equal to 

1 n 

» n(0)[h} = -Y J {^{X i , 1 ,g) + \{Z i )}*<fc{Z i ) . (F.97) 

2—1 

We further observe that if we metrize the topology on B = R x C 1 ([ 0 , 1 ]) x £°°(R dz ) 
by II^IIb = II7II2 V Ill'll 1,00 v II Alloc for any 9 = (7, g, A) £ B, then Assumption 6.1(i) 
holds with Ki, = 3 . In turn, we also note that since : B —> £°°([ 0 , 1 ]) x ^(R^ 2 ) 
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is linear and continuous, Assumptions 6.2(i)-(iii) are satisfied with K g = 0, some finite 
M g , and VTg(0)[/i] = —(</, A) for any 9 E B and h = ( 7 ,A). Similarly, since Tp : 
B —» R is affine and VTp($)[/z] — 7 for any 9 E B and h — ( 7 ,^/, Assumption 6.4 is 
automatically satisfied, while Assumption 6.3 holds with Kf = 0 and Mf = 1. 

Writing each 9 E @ n PI i? in the form 9 = ( 7 , 5 , q^ 1 ' tt), we then finally obtain that 


C/ n (R) = inf inf ||W n p(-, 9 ) * q“ n + D n ( 6 »)[( 7 ,p^'/ 3 , C"'vr) 
6»e©„nR (7,P.ir) 


I T, n ,r 


s.t. (i) 7 = 0, 

Wn'P 

(ill) 


7 r 


(ii) —f= > 0 A (r n - 7r) 


n 


n 


> 0 A (r n - g), 


r \ iFF x/ ii 77 1 
( 1V ) —/Hi,00 V - 7 = 
>n Jn 


00 — f-n 


(F.98) 


where constraint (i) corresponds to T p{9 + h/y/n) = 0, the restrictions in (ii) and (iii) 
impose h/y/n E G n (9), and the constraint in (iv) demand ||/i/\H!b < £ n ~ compare to 
the definition of V n (9,£ n ) in (69). As in Section 7, constraint (iii) reduces to a finite 
number of linear constraints when employing orthonormalized b-Splines of Order 3 as 
the basis {j>j,n}j=\- Moreover, under such a choice of sieve we further have 

lk" , /9||i,oo<^ /2 ||^||2, (F.99) 


1 /2 

(see Newey (1997)) and thus exploiting ||/3||2 < jn ||/?||oo it ma Y be preferable for easy 
of computation to replace the constraint ||pn n/ /3||i,oc < \fni n in (F.98) by the more 
conservative but linear constraints ||/3||oo < V^^n/jn- 

We refer to Theorem H.l for sufficient conditions for Assumption 6.5 to hold, and 
focus on the rate requirements imposed by Assumption 6.6. To this end, we first observe 


sup 

t£RA> 


I n^ n 

1*171 


7 T 


sup PGP 


\qt 


< 


7 r 


T 2 

Lp 


sup 

rGR fc " 


I rfc n 

I *in 


7 r 


F 2 


!\[Ki 


= s/kn 


(F.100) 


where in the second inequality we exploited (F.80) implies that supp gP |g(( n/ 7r|| L 2 ) x 
11 11 2 / y/kni and the final inequality follows from Hg^VIloo = Halloo due to {Ak, n }k=i 
being a partition of R rfz . Hence, results (F.99) and (F.100) together imply that 


<5n(B, E) < j^/ 2 V k// 2 . 


(F.101) 


We think it advisable to set r n = 0 which automatically satisfies Assumption 6.6 and is 
simpler to implement, though we note that in contrast to Examples 2.1, 2.2, and 2.3 such 
a choice may lead to a loss of power. We further note that the rate of convergence derived 
in (F.92) and the bound in (F.101) imply Assumption 6.6(iii) is satisfied provided 


k n C n yJ(kn V jn) log(k~n)(jn /2 V Ay/ 2 ) 

yjn 


o(r n ) . 


(F.102) 
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Finally, we note (F.93) implies the conditions imposed on £ n by Assumption 6.6 become 


tn(kl/ r a /(jn V k n ) log (k n ) log(C n ) = o(a n ) . (F.103) 

In parallel to Examples 2.1, 2.2, and 2.3 it may be possible to establish under additional 
conditions that the bandwidth i n is not necessary - i.e. that constraint (iv) may be 
dropped in (F.100). Unfortunately, such a conclusion cannot be reached by applying 
Lemma 6.1 due to a failure of the condition ||L||e < Di||IA,pUo) [AU for all Oq G 
(©On(-P) n R) e and h G y/n{ B n n R — 6}, which is required by Lemma 6.1. ■ 


Appendix G - Uniform Coupling Results 


In this Appendix we develop uniform coupling results for empirical processes that 
help verify Assumption 5.1 in specific applications. The results are based on the Hungar¬ 
ian construction of Massart (1989) and Koltchinskii (1994), and are stated in a notation 
that abstracts from the rest of the paper due to the potential independent interest of 
the results. Thus, in this Appendix we consider V G RA as a generic random variable 
distributed according to P G P and denote its support under P by fi(P) C RA. 

The rates obtained through a Hungarian construction crucially depend on the abil¬ 
ity of the functions inducing the empirical process to be approximated by a suitable 
Haar basis. Here, we follow Koltchinskii (1994) and control the relevant approximation 
errors through primitive conditions stated in terms of the integral modulus of continu¬ 
ity. Specifically, for A the Lebesgue measure and a function / : R d " — > R, the integral 
modulus of continuity of / is the function vj(f, •, P) : R+ —> R+ given by 

^{f,h,P)= sup ( / (f(v + s) - f(v)) 2 l{v + s G n(P)}d\(v))^ . (G.l) 

\\s\\<h Jfl(P) 

Intuitively, the integral modulus of continuity quantifies the “smoothness” of a function 
/ by examining the difference between / and its own translation. For Lipschitz function 
/, it is straightforward to show for instance that w (/, h. P) < h. In contrast indicator 
functions such as f(v) = l{v < £} typically satisfy zj(f, h, P) < /i 1 / 2 . 

The uniform coupling result will established under the following Assumptions: 

Assumption G.l. (i) For all P G P, P <C A and H(P) is compact; (ii) The densities 
dP/dX satisfy sup PgP sup„ en(P) j£(v) < oo and inf PeP inf„ en(P ) j£(v) > 0. 

Assumption G. 2 . (i) For each P G P there is a continuously differentiable bijection 
Tp : [0, l\ dv —>• fl(P); (ii) The Jacobian JTp : [0, l] rfu -» R and derivative T' p : [0, l] d ” —> 
H(P) satisfy inf PeP inf„ g [ 0jl ]^ |JT P (u)| > 0 and sup PgP sup , ug [ 0 ^ dv || jr P (u )|| 0i2 < oo. 
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Assumption G.3. The classes of functions T n satisfy: (i) sup PgP sup/ e jr ri va(f, h, P ) < 
<p n (h) for some ip n : R+ —> R+ satisfying ip n {Ch ) < C K <p n (h) for all n, C > 0, and 
some k > 0; and (ii) sup PgP supy gPj! ||/||l^ < K n for some K n > 0. 

In Assumption G.l we impose that V ~ P be continuously distributed for all P £ P, 
with uniformly (in P) bounded supports and densities bounded from above and away 
from zero. Assumption G.2 requires that the support of V under each P be “smooth” 
in the sense that it may be seen as a differentiable transformation of the unit square. 
Together, Assumptions G.l and G.2 enable us to construct partitions of Ll(P) such that 
the diameter of each set in the partition is controlled uniformly in P: see Lemma G.l. 
As a result, the approximation error by the Haar bases implied by each partition can be 
controlled uniformly by the integral modulus of continuity; see Lemma G.2. Together 
with Assumption G.3, which imposes conditions on the integral modulus of continuity of 
P n uniformly in P, we can obtain a uniform coupling result through Koltchinskii (1994). 
We note that the homogeneity condition on tp n in Assumption G.3(i) is not necessary, 
but imposed to simplify the expression for the bound. 

The following theorem provides us with the foundation for verifying Assumption 5.1. 

Theorem G.l. Let Assumptions G.1-G.3 hold, he i.i.d. with V) ~ -P G P and 

for any 5 n | 0 let N n = sup PgP N^(6 n ,P n , || • H^), J n = sup PgP J[](5 n ,P n , || • || L 2 j, and 

rio g2 n\ 

S n = ( £ 2V5(2-£))i . (G.2) 

1=0 

If N n t oo, then there exist processes {W n ,p}^=i such that uniformly in P € P we have 


||Gn,P — W njP ||jr n 

( K n \og{nN n ) | K ny /\og{nN n ) log(n)S n , T n , J n K n 


(G.3) 


Theorem G.l is a mild modification of the results in Koltchinskii (1994). Intuitively, 
the proof of Theorem G.l relies on a coupling of the empirical process on a sequence 
of grids of cardinality N n , and relying on equicontinuity of both the empirical and 
isonormal processes to obtain a coupling on T n . The conclusion of Theorem G.l applies 
to any choice of grid accuracy 5 n . In order to obtain the best rate that Theorem G.l can 
deliver, however, the sequence S n must be chosen to balance the terms in (G.3) and thus 
depends on the metric entropy of the class T n . The following Corollary illustrates the use 
of Theorem G.l by establishing a coupling result for Euclidean classes. We emphasize, 
however, that different metric entropy assumptions on T n lead to alternative optimal 
choices of S n in Theorem G.l and thus also to differing coupling rates. 
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Corollary G.l. Let Assumption 3.1, 3.2(i), G.l, G.2 hold, and sup||/|| l“ be 
bounded uniformly in n and P E P. If supp gP sup^gp^ max^j w(fqk, n ,p h, P) < A n h 7 
for some 77 E (0,1], supp gP iV[](e, F n , || ■ H^) < (D/e) J ' n for some j n f oo and D < oo, 
and log (k n ) = 0(j n ), then it follows that uniformly in P E P we have 


II &n,pfqt - Wn,pfq k n 


= O r (k 1 J r B„lo S (B n k n „){^Pf + kMjph }) , 

77 , 7 /yj 77 


Below, we include the proofs of Theorem G.l, Corollary G.l, and auxiliary results. 

Proof of Theorem G.l: Let {A,(P)} be a sequence of partitions of Ll(P) as in Lemma 
G.l, and Bp^ the u-algebra generated by A i(P). By Lemma G.2 and Assumption G.3, 


fl°g 2 »d 

sup sup ( V 2 l E P [{f{V)-Ep[f{V)\Bp tl ]) 2 })^ 

P£Pf^ n f^ 0 

r io §2 «i 

<Cr( £ 2V^(2 -%))* = Crfn (G.4) 
i=0 

for some constant C\ > 0, and for S n as dehned in (G.2). Next, let Pp, n , 5 n Q F n denote 
a hnite <5 n -net of T n with respect to || ■ \\ L 2 p . Since N(e,F n , || ■ || L 2 ,) < A r [](e,T n , || ■ \\ L 2 p ), 
it follows from the definition of N n that we may choose J-p, n , 5 n so that 

sup card(J r p ) 71 i 5 n ) < sup Nn(S n , F n , || • || L 2 ) = N n . (G.5) 

PeP PeP 

By Theorem 3.5 in Koltchinskii (1994), (G.4) and (G.5), it follows that for each n > 1 
there exists an isonormal process W n> p, such that for all 771 > 0 , 772 > 0 


sup P(j—\\G n} p - W^pIIjfp^ > rn + y/my/rjKCi S n + 1)) 


PGP 


< N n e~ C2m + ne ~ C2V2 , (G. 6 ) 


for some C 2 > 0. Since N n f 00 , (G. 6 ) implies for any e > 0 there are C 3 > 0, C 4 > 0 
sufficiently large, such that setting rj\ = C 3 log (N n ) and 72 = C 3 log(n) yields 


sup P(||G n ,p — W n , P ||p p ^ 
PeP 


n 


> C^Kn X 


log(nlV n ) + x /log(iV n ) log(n)5„ 


(G.7) 


Next, note that by definition of Pp : n, 8 n , there exists a T nj p : T n —> Pp t n, 5 n such that 
suppgp sup /e p n ||/ — F n ,pf\\p 2 p < 5 n . Let D(e,F n , || ■ \\p 2 p ) denote the e-packing number 
for T n under || • || L 2 , and note D(e,F n ,\\ ■ || L 2 j < iV[](e, F n , || • Wi^f). Therefore, by 
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Corollary 2.2.8 in van der Vaart and Wellner (1996) we can conclude that 


sup -Ep[||W nj p — W n ,p o r ni p||pj 

PeP 


^ sup 
PeP 



^logD(e,J r n , 


L %)de < sup Ju{5 n ,F n , II ■ Hl 2 J = J n ■ (G. 8 ) 

p PeP p 


Similarly, employing Lemma 3.4.2 in van der Vaart and Wellner (1996) yields that 


sup £?p[||<G nj p — G n> p o r ni p||p n ] 
PeP 


< sup Jn(5 n ,F n , 
PeP 


I Li) ( 1 + SUp 
p PeP 


J[] (^n? n 


r 2 
\L p 


)K n 


51 


) = Jn( 1 + 



(G.9) 


Therefore, combining results (G.7), (G. 8 ), and (G.9) together with the decomposition 


*n,P - W n ,p||p n 

< ||G n> p — w n ,p||p Pn5n + ||G nj p — G n; p o r ni p||p n + ||w n ,p — w n; p o r nj p||p n 


(G.10) 


establishes the claim of the Theorem by Markov’s inequality. ■ 

Proof of Corollary G.l: Define the class Q n = {fqk,n,j '■ for some / G J- nj 1 < 
j < J and 1 < k < k n . :) \. and note that Lemma B.l implies that for any 5 n 0 

D B 

sup N[](8 n ,G n , II • || L 2 i ) <k n x sup Nn(5 n /B n ,B n , || • || L 2 ) < k n x (—r^) in • (G.ll) 
PeP p PeP p 5 n 

Similarly, exploiting (G.ll), 5 n 0 and / Q ° log (M/u)du = a\og(M/a) + a we conclude 

rS„ DB 

sup J[](6 n ,G n , II • || L 2 ) < / {log(fc n ) + j n log(--)} 1/2 de 

PeP p Jo e 

rSn n T3 _ D 

< (\/log Wa) + VJ^) log(- -)de < (y/\og(k n ) + v^) x 5 n log(—^) . (G.12) 

Jo e °n 

In turn, note that for S n as defined in (G.2), we obtain from <p n (h) = A n h 7 that 


riog 2 n\ 


{ E 2 ‘ x 


A 2 

2_n 

^i-y/dy 


} 1/2 < -4. 


1 

x n 2 


7 

dy 


(G.13) 


Finally, we also note that sup||/|| l«> < B n due to Assumption 3.2(i) and / G T n 
being uniformly bounded by hypothesis. Therefore, setting 5 n = \fPij\fn in (G.ll) and 
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(G.12), and exploiting ||a|| r < d 1 / r ||a|| 0O for any a G we obtain that 


sup ||G n P /q£" - W niP /^ n || r 
f€T n 


< k^WG^p- W n ,p|| 6n = O p {k^B n \og(B n k n n){^p- + ^og(B n n) 

rCH av Wn 


(G.14) 


uniformly in P G P by Theorem G.l. ■ 


Lemma G.l. Let Bp denote the completion of the Borel a—algebra on f2(P) with respect 
to P. If Assumptions G.l(i)-(ii) and G.2(i)-(ii) hold, then for each PgP there exists 
a sequence {A i(P)} of partitions of the probability space (Ll(P),Bp, P) such that: 

(i) A i(P) = {A Lk {P) : k = 0,..., 2 i - 1}, A iik (P) G B P and A 0 >0 (P) = fi(P). 

(ii) A i,k(P) = A i+ 1 , 2 k(P) u A i+ i ) 2 fc+i(P) and A i+ i, 2 fe(P) n A i+li 2 fc+ 1 (P) = 0 for any 
integers k = 0 ,... 2 * — 1 and i > 0. 

(mj P(Aj + i j 2 fe(P)) = P(Aj + i > 2 fe+i(P)) = 2 - ^ 1 for k = 0,... 2 l - 1, i > 0. 

_ i_ 

(iv) sup Pg p maxo^fc^i.! sup„y eA . fe(P) ||n - v || 2 = 0(2 ). 

(v) Bp equals the completion with respect to P of the a-algebra generated by U*>o Aj(P). 


PROOF: Let A denote the Borel cr-algebra on [0, l] dv , and for any A G A define 


Q P {A) = P(T P (A )) , 


(G.15) 


where Tp(A) G Bp due to Tp 1 being measurable. Moreover, Qp([0, l] d “) = 1 due to Tp 
being surjective, and Qp is u-additive due to Tp being injective. Hence, we conclude 
Qp defined by (G.15) is a probability measure on ([0, l] rfl ’, A). In addition, for A the 
Lebesgue measure, we obtain from Theorem 3.7.1 in Bogachev (2007) that 


Q P (A) = P(T P {A)) = [ ^-(v)d\(v) = [ d ^-(Tp(a))\JT P (a)\dX(a) , (G.16) 

where \JTp(a)\ denotes the Jacobian of Tp at any point a G [0, \] dv . Hence, Qp has 
density with respect to Lebesgue measure given by gp(a ) = j^(Tp(a))\JTp(a)\ for any 
a G [0, l] dv . Next, let a = (ai, ..., ad v )' G [0, l] dv and define for any t G [0,1] 


Gi, P (t\A) 


Qp(a G A : ai <t) 
Qp(A) 


(G.17) 


for any set A G A and 1 < l < d v . Further let rri('i) = i — |_+^J x ( ^ v ~ i- e - m (*) ec l ua l s * 
modulo d v - and setting Aq,o(P) = [ 0 , 1] 6 * 11 inductively define the partitions (of [ 0 , l] rf “) 


Aj+i ) 2 fc(P) — {fl G A ii fc(P) . G m (j_|_^yp(o m (j_|_ 2 ) | Ajj.(P)) < 2 } 
Ai+i, 2 *;+i(P) = Aj t k(P) \ Ai+i ,2k(P) (G.18) 


110 





for 0 < k < 2* — 1. For cl(A i, k (P)) the closure of A ijk (P), we then note that by 
construction each A i tk (P) is a hyper-rectangle in [ 0 , l]' lr i.e. it is of the general form 

dv 

cl(A itk (P)) = l[[li,kA p )’ u i,kA p )} ■ (G.19) 

3 = 1 

Moreover, since gp is positive everywhere on [0, \} dv by Assumptions G.l(ii) and G.2(ii), 
it follows that for any i> 0 , 0 <fc< 2 * — 1 and 1 < j < d v , we have 


li+l,2k,j\P) — 

u, k . (P) = { Ui ’ k ’i^ if j * + ^ „ (G.20) 

' \ solves G m ( i+ i) iP (« i+ i ! 2 A;j(P)|Aj ifc (P)) = | if j = m (i + 1 ) 

Similarly, since Aj+i^fc+iC-P) = A i tk (P) \ ^-i+i, 2 k{P), it additionally follows that 


u i+l,2k+l,j{P) ^i,k,j{P) h+l,2k+l,j{P) 


k,k,j( p ) if j ^m(i + l) ^ G21 ^ 


Ui+i,2k,j{P) if j = m(i + 1 ) 

Since Qp(cl(A i+ i i 2 fc(P))) = Qp(A i+ i j 2 fc(P)) by Q P < A, (G.17) and (G. 20 ) yield 


Qp(^i+l,2k(P )) Qp( a £ A i t k{P) • ®m(i+l) A '^'j-|-l, 2 fc,m(i-(-l) (P)) 

Gm(i 4 -l),p(^i+l, 2 fc,m(i-|-l) (P)\^i,k(P))Qp(^i,k{P)) 

= l -Qp{K^ k {P)) . (G.22) 

Therefore, since A i>k (P) = A i+li2 k(P)V\+i, 2 k+i(P), it follows that Qp{A i+1 ^k+i{P)) = 
^Qp(Aj ) fc(P)) for 0 < k < 2 * — 1 as well. In particular, Qp(Ao,o(-P)) = 1 implies that 

Qp(A iifc (P)) = (G.23) 

for any integers i > 1 and 0 < k < 2 l — 1. Moreover, we note that result (G.16) and 
Assumptions G.l(ii) and G. 2 (ii) together imply that the densities gp of Qp satisfy 


0 < inf inf gp(a ) < sup sup gp(a) < oo , (G.24) 

PePae[o,i]^ PeP ae[o,i] d - 
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and therefore Qp(A) x X(A) uniformly in A 6 A and P £ P. Hence, since by (G.20) 
Ui+i, 2 k,j(P) = u it k,j{P) and l i+1 , 2 k,j{P) = h,k,j(P) for all j / m (i + 1), we obtain 


i u i+l,2k,m(i+l) (P) ~~ h+l,2k,m(i+l){P)) E[j=l ( u i+l,2k,j {P) ~ h+l,2k, j(P)) 


i.'^'i,k,m(i-\-l){P') ^i,fc,m(*+l) {P)') 


UT=M,kj(P) - kkAP)) 


A(A i+h2k (P)) 
A(A iik (P)) 


Qp(^i+l,2k(P)) _ 1 (r 9 _x 
Qp(\,k{P)) 2 


uniformly in P G P, i > 0, and 0 < k < 2* — 1 by results (G.23) and (G.24). Moreover, 
by identical arguments but using (G.21) instead of (GAO) we conclude 


(^h+l,2fc+l,m(i+l) {P) ^i+l,2fc+l,m( i+l)(P)) „ 1 

(^j,fe,m(i+l) (P) ^i,fc,m(i+l) (-f*)) 2 


(G.26) 


also uniformly in P E P, i > 0 and 0 < k < 2* — 1. Thus, since d(^) - 

li+l, 2 k,j(P)) — (^i+l,2fc+l ,j (P) ^i+l, 2 fc+l ,j (-H)) — i,k,j(P ) li,k,j(P)) for all j 7 ^ m(iTl), 
and uofi,j{P) ~ h,o,j(P) = 1 fo r all 1 < j < d v we obtain from m(z) = i — x d v , 

results (G.25) and (G.26), and proceeding inductively that 


(' u i,k,j{P ) h,k,j(P )) 


(G.27) 


uniformly in P G P, i > 0, 0 < fc < 2* - 1, and 1 < j < d v . Thus, result (G.27) yields 


sup max sup ||a — afe 
p GP o<fc<2*-i aa , GA . fc(p) 

< sup max max \fd v x (uu k(P) - k,j,k(P )) = 0(2~^) . (G.28) 

PeP 0<fc<2*-l 1 <j<d v 


We next obtain the desired sequence of partitions {A*(P)} of (£l(P), Bp, P) by con¬ 
structing them from the partition {Aj k(P)} of [0,1 ] (lv . To this end, set 


A iifc (P) = T P (A hk (P)) (G.29) 

for all i > 0 and 0 < k < 2 l — 1. Note that {Aj(P)} satishes conditions (i) and (ii) 
due to Tp 1 being a measurable map, Tp being bijective, and result (G.18). In addition, 
{Aj(P)} satisfies condition (iii) since by definition (G.15) and result (G.23): 

P(\,k(P)) = P(T P (Ai, k (P))) = Q P {hi,k{P)) = 2 - * , (G.30) 
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for all 0 < k < 2 l — l. Moreover, by Assumption G.2(ii), sup PgP sup ae [ 01 ]^ ||T^,(a) || 0 ,2 < 
oo, and hence by the mean value theorem we can conclude that 

sup max sup ||u — v'W^ = sup max sup \\Tp(a) — Tp(a ')\\2 

PeP0<k<2i-l v y eAik (p) p e pO<fc<2i-l aa , gA . fc(p) 

< sup max sup ||a — a ||2 = 0(2 ) , (G.31) 

p eP o<fc< 2 ‘-i aa/gAifc(P) 

by result (G.28), which verifies that {Aj(P)} satisfies condition (iv). Also note that to 
verify {Aj(P)} satisfies condition (v) it suffices to show that U i>0 A*(P) generates the 
Borel (7-algebra on H(P). To this end, we first aim to show that 

A = a(\jA i {P)) , (G.32) 

i> 0 

where for a collection of sets C, a(C) denotes the cr-algebra generated by C. For any 
closed set Aed, then define Dj(P) to be given by 

Di(P) = IJ A,, fe (P) . (G.33) 

k:A i:k (P)nA^H 


Notice that since (Aj(P)} is a partition of [0, l\ dv , A C Di(P) for all i > 0 and hence 
A C Hi>o A(-P)- Moreover, if ao G A c , then A c being open and (G.28) imply ao Di(P ) 
for i sufficiently large. Hence, A c n (Pli>o Di(P)) = 0 and therefore A = n*>o Di{P). It 
follows that if A is closed, then A £ rx((J i>0 A;(P)), which implies A C cr(Ui>o Aj(P)). 
On the other hand, since A i k(P) is Borel for all i > 0 and 0 < k < 2 l — 1, we also have 
c7(Uj>o != ^4, and hence (G.33) follows. To conclude, we then note that 

d(U A,(P)) = a((J Tp(Aj(P))) = T P (a(\J A,(P))) = T P (A) , (G.34) 

z>0 i> 0 i>0 

by Corollary 1.2.9 in Bogachev (2007). However, Tp and Tp 1 being continuous implies 
Tp(A) equals the Borel cr-algebra in H(P), and therefore (G.34) implies {Aj(P)} satisfies 
condition (v) establishing the Lemma. ■ 

Lemma G.2. Let {Aj(P)} be as in Lemma G.l, and Bp^ denote the cr-algebra generated 
by Ai(P). If Assumptions G.l(i)-(ii) and G.2(i)-(ii) hold, then there are constants 
Ko > 0, K\ > 0 such that for all P £ P and any f satisfying f £ L 2 p for all P £ P: 

E P [(f(V) - Ep[f{V)\B P ^}) 2 } <K 0 x w 2 {f,K l x 2 ~l,P) . 
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PROOF: Since A*(P) is a partition of fi(P) and P(A,;fc(P)) = 2 1 for all i > 0 and 
0 < k < 2* — 1, we may express Ep[f(V)\Bp,i\ as an element of L 2 P by 


Ep[f{V)\B P ,] 


2 l —1 


2* E G A iA p )} 


k=0 


A ilfc (P) 


f(v)dP(y) 


(G.35) 


Hence, result (G.35), P(A i i t (P)) = 2~ l for all z > 0 and 0 < k < 2* — 1 together 
with A i(P) being a partition of fl(P), and applying Holder’s inequality to the term 
( f(v ) — f(v))l{v G H(P)} x !{{; G A^pP)} we obtain that 


Ep[{f(y)-Ep[f(y)\B P ,}f 

2 i -l „ 


E 


k =0 JA iA p ) 
2 i —l 


(/(«) - 2 * 


'A i|Jb (P) 


f(v)dP(v)) 2 dP(v) 


Y [ ( / (/(”) - /(«))!{« e H(P)}dP(u)) 2 dP(u) 

fc=0 jA iA p ) Ja iA p ) 

2 * —1 

< E 2 2 *P(Aj i fc(P)) [ [ (f(v) - /(u)) 2 l{u G H(P)}dP(u)dP(u) 

^ JA iik (P) JA itk (P) 

2 i — l 

= E 2 ‘ f / (/(«) - /(^)) 2l i^ e H(P)}dP(u)dP(u) . (G.36) 

^ JA iik (P) JA iik (P) 


Let Di = supp g p max 0<fc<2 i_i diam{Aj i / c (P)}, where diam{Aj j fe(P)} is the diameter 

_i_ 

of Aj /.(P). Further note that by Lemma G.l(iv), Di = 0(2 ) and hence we have 

A({s G : IP < P*}) < M 1 2 * for some Mi > 0 and A the Lebesgue measure. 
Noting that supp gP sup^g^p) ^y(u) < oo by Assumption G.l(ii), and doing the change 
of variables s = v — v we then obtain that for some constant Mq > 0 


Ep[(f(V)-E P [f(V)\Bp,i}) 2 


2 l —1 

< M 0 Y 2i 


(f(v) - f(v)) 2 l{v G Q(P)}d\(v)d\(v) 

< MqM\ sup Y f (/(u + s) — f(v)) 2 l{v + s G £l(P)}d\(v) . (G.37) 

l|s||<I?i 7 ,—n JAi_ u(P) 


k=0 ' A i,k( p ) J A i, k (P) 

2 i — l 


fc =0 


Next observe that ro(/, h, P ) is decreasing in h. Hence, since {Aj^P) : k = 0 ... 2* — 1} 

_ a 

is a partition of H(P), and P* < J\i2 for some K\ > 0 by Lemma G.l(iv), we obtain 


E P [(f(V) - Ep[f{y)\B P ^]) 2 ] < M 0 M 1 x w 2 (f,K 1 x 2"*,P) (G.38) 


by (G.37). Setting /Lq = Mq X Mi in (G.38) establishes the claim of the Lemma. ■ 
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Appendix H - Multiplier Bootstrap Results 


In this Appendix, we develop results that enable us to provide sufficient conditions 
for verifying that Assumption 6.5 is satisfied. The results in this Appendix may be of 
independent interest, as they extend the validity of the multiplier bootstrap to suitable 
non-Donsker classes. In particular, applying Theorem H.l below to the case qk,n( z ) = 1 
and k n = 1 for all n yields the consistency of the multiplier bootstrap for the law of the 
standard empirical process indexed by a expanding classes T n of functions. 

Our analysis requires the classes T n be sufficiently “smooth” in that they satisfy: 

Assumption H.l. For each PgP and n there exists a {pj,n,p}’jL\ C L 2 P such that: (i) 
{Pj,n,p}pLi is orthonormal in L 2 p and ||pj,n.,p||L^ is uniformly bounded in j,n £ N and 
P £ P; (ii) For any j n f oo and, pP p (v) = (pi >n ,p(v ),... ,Pj n ,n,p{v))' the eigenvalues 
ofE P [(q* n (Zi) ®< P (R))(^(^) < 8 >< P (R)y] are bounded uniformly in n and P £ P: 
(in) For some M n f oo and r y p > 3/2 we have for all P £ P the inclusion 

°° nr 

T n Q{f = ^ ~2Pj,n,p{v)(3j ■■ {Pj}f= i satisfies \fifi < -T)} . (H.l) 

i =i J P 

Assumption H.l(i) demands the existence of orthonormal and bounded functions 
{Pj,n,p} ( jLi hr L 2 p that provide suitable approximations to the class F n in the sense 
imposed in Assumption H.l(iii). Crucially, we emphasize that the array {pj, n ,p}'jL i need 
not be known as it is merely employed in the theoretical construction of the bootstrap 
coupling, and not in the computation of the multiplier bootstrap process W n . In certain 
applications, however, such as when p is linear in 9 and linear sieves are employed as 
© n , the functions {pj, n ,p}JLi may be set to equal a rotation of the sieve . 21 It is also 
worth pointing out that, as in Appendix G, the concept of “smoothness” employed does 
not necessitate that p be differentiable in its arguments. Finally, Assumption H.l(ii) 
constrains the eigenvalues of Ep[(q(fi(Zi)®p , 1 f p (Vi))(q , fi 1 (Zi)®p , 7 f p (Vi)y] to be bounded 
from above. This requirement may be dispensed with, allowing the largest eigenvalues to 
diverge with n, at the cost of slowing the rate of convergence of the Gaussian multiplier 
bootstrap W n to the corresponding isonormal process W* P . 

As we next show, Assumption H.l provides a sufficient condition for verifying As¬ 
sumption 6.5. In the following, recall F n is the envelope of T n (as in Assumption 3.3(h)). 

Theorem H.l. Let Assumptions 3.1, 3.2(i), 3.3(H), H.l hold, and {wj}f =1 bei.i.d. with 
Ui ~ N( 0,1) independent of {RR =1 . Then, for any j n f oo with j n k n log(j n k n )B n = 

20 Concretely, if 0„ = {/ = '■ f° r some 7 £ R-'"} and X = (Y. W')' with Y £ R and p(X, 6) = 

Y — 9(W), then a candidate choice for pfi +1 (v) are the orthonormalized functions (y,pfi-(w)')'. 
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o(n) there is an isonormal W* p independent o/{V)}f = i satisfying uniformly in P € P 


sup \\W n fqt ~W* n> pfqt\\r 
f&Pn 


= o p { 


sup P6P 


I F n 


t2 

\L p 


1 1 
R2 U 4" 
J-'n nn 


(j n log(fc n )) 4 (log(j n )) 4 


1 

n 4 


+ 


fen /r B n M n A /log(fe n 

,7p-3/2 

Jn 


The rate of convergence derived in Theorem H.l depends on the selected sequence 
j n f oo, which should be chosen optimally to deliver the best possible implied rate. 
Heuristically, the proof of Theorem H.l proceeds in two steps. First, we construct a 
multivariate normal random variable W*p(g^ n <8> P^ffp) £ YV nkn that is coupled with 
<8> p’np) £ RP nkn , and then exploit the linearity of W n to obtain a suitable 
coupling on the subspace § n ,p = span{g^ n (g) p^pj- Second, we employ Assumption 
H.l(iii) to show that a successful coupling on S n> p leads to the desired construction since 
T n is well approximated by {pj, n ,p}'jLi- We note that the rate obtained in Theorem H.l 
may be improved upon whenever the smallest eigenvalues of the matrices 

Ep[isfc&i) ®<pW))(^(^) ^pLym'} (h.2 ) 

are bounded away from zero uniformly in n and P € P; see Remark 8.1. Additionally, 
while we do not pursue it here for conciseness, it is also worth noting that the outlined 
heuristics can also be employed to verify Assumption 5.1 by coupling Gn,p(<Zn l ®P > np) 
to W n ,p(Qn n <55 Pn'p) through standard results (Yurinskii, 1977). 

Remark 8.1. Under the additional requirement that the eigenvalues of (H.2) be bounded 
away from zero uniformly in n and PsP, Theorem H.l can be modified to establish 

sup \\Wnfqt -VK,pf<ln n \\ r 

f&Fn 

. (H.3) 

Given the assumed orthonormality of the array {pj.n,p}JLi, the rate obtained in (H.3) 
is thus more appropriate when considering the multiplier bootstrap for the standard 
empirical process - i.e. qk,n( z ) = 1 and k n = 1 for all n - since the smallest eigenvalue 
of the matrices in (H.2) then equals one. ■ 

Below, we include the proof of Theorem H.l and the necessary auxiliary results. 

Proof of Theorem H.l: We proceed by exploiting Lemma H.l to couple W n on a 
finite dimensional subspace, and showing that such a result suffices for controlling both 
W n and W* p on T n . To this end, let § ni p = span{g(j' n <g>p^"p} and note that Lemma H.l 


= o P ( 


SUPpgp 


I F„ 


\ L %Bnkn 


L 

r jn log(fen) \/log(jfi 


n 


+ 


k 1/r 

’ X, n 


B n M n \ 


Pp~ 3 

Jn 
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and j n t oo satisfying j n k n log {j n k n )B n = o(n) by hypothesis imply that there exists a 
linear isonormal process W^p on §n 5 p such that uniformly in P £ P we have 


sup ||W nW^p)gt - w ^ P W n y)qt Hr 

||^|| 2 <sup Pg p \\F n \\jji 

, SU PP6P \\Fn \\ L 2 B^kn r 0nlog(/c ri ))4(log(j n ))4 

= Op( ---I-) • (H.4) 

n 4 


For any closed linear subspace A of L 2 P , let Proj{/|A} denote the || • \\ L 2 p projection 
of / onto A and set A 2 - = {/ £ L 2 p : f = g — Proj{g|A} for some g £ L 2 p } - i.e. A 2 - 
denotes the orthocomplement of A in L 2 p . Assuming the underlying probability space is 
suitably enlarged to carry a linear isonormal process W^ 2 p on S^ p independent of W^p 
and {Vj}” =1 , we then define the isonormal process W* P on L 2 P pointwise by 

W *, P f = W^(Proj{/|S n ,p}) + W^p(Proj{/|S^p}) . (H.5) 

Next, set P n> p = span {p^p} and note that since Proj{/|P n .p} = /3{/)']?£ P for some 
(3(f) £ R- 7 ™, the orthonormality of {pj,n,p} J j=i imposed in Assumption H.l(i) implies 
\\P(f)h < ll/H l 2 p < 11I Il|, % Assumption 3.3(h). Since (Proj{/|P n) p})g fe>n ,., £ § n ,p 
for any / £ 1 < j < J and 1 < k < k n> j , (H.4) and (H.5) imply uniformly in P £ P 


sup 


i (Proj{/1lPn,P}) Qn” - W;p(Proj{/|P„,p})g: 


n II r 


= o P ( 


(/5'<p)^"-W^(^p)^ 

SUpp g p \\F n \\ L 2 B^kn r (jnlog(fc n ))4(log(j n ))4 


< sup 

llhl| 2 <sup Pe p ||F n || £ 2 


1 

n 4 


(H.6) 


Next, define the set of sequences B n = { {Pj}Jfj n '■ \Pj\ — M n /j lp }, and note that 


sup ||W*,p(Proj{/|Pip})gM| r 

feTn 

OO 

< k n /r SU P ™ |W* p(g fc , nj ^ te, n ,p)| (H.7) 

1 <k<k n , 3 

by Assumption H.l(iii). Moreover, also note that for any {Pj}, {Pj} £ B n , we have that 

OO OO 

{E P (q* tnj (Z it3 )( £ {Pi - Pi)Pi,n, P m 2 }} 1/2 I Pj ~ Pi I (H.8) 

i=3n j=in 

by Assumptions 3.2(i) and H.l(i). Hence, since W* p is sub-Gaussian with respect to 
II • Hz,®,, defining g n = {/ £ L 2 P : / = q k) n l3 J2j>j n PjPj,n,P for some 1 < j < J and 1 < 
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k < k nj . {f3j} G B n } we obtain from Corollary 2.2.8 in van der Vaart and Wellner (1996) 


Bp [sup |W£ P s|] < [ J\ogN(e,G n , || • \\ L ii)de 
g£Qn Jo v p 


fBnMnj 




< 

r^j 


log(k n N(e/B n ,B n ,\\ • || tl ))de , (H.9) 


where the final inequality holds for ||/ 3||4 = YJjLj n \Pj\ by (H. 8 ) and noting that since 
{ Pj,n,P } are uniformly bounded by Assumption H.l(i), Q n has envelope G n satisfying 
HGnll^ < B n M n Y,j> jn ,r 7p < B n M n jn (7p-1) . Furthermore, note that Lemma H.2, 
the change of variables u = ejn p 1 /B n M n , and > 3/2 additionally yield 


< 


rBnMnj 


B n M n r 

jar 1 Jo 


-(7 p-i) 


log(fc n A^(e/S n ,S n , || • ||^))de 

(log(L n ) + (-^—log(— + l)}^du 
Hip ~ 1) UJn 


B n M n y/\og(k n 

■Ip- 3/2 


(H.10) 


Therefore, we conclude by results (H.7), (H.9), (H.10), and Markov’s inequality that 

IIW* /td ■ r fipl i\ fe„|i B n M n y/log(k n ) 

sup ||W; ) p(Proj{/|P ni p})g n "|| r = O p (--) . (H.ll) 

f&Fn J n 


In order to obtain an analogous result to (H.ll) for W n , we similarly note that 


sup ||W„(Proj{/|P^ P })^" || r 

f&Fn 

OO 

< k]j r sup max max |W n (q k ,n,j E PjPj,n,p)\ ■ (H.12) 

l M<J i <k<k n , 3 ■ ^ 

Moreover, since {wj}f =1 is independent of {V)}/ =1 , we also obtain from Assumption 3.2(i) 
and {Pj,n,p}j>j„ being uniformly bounded in j,n and P G P by Assumption H.l(i) that 


P[{Wn(5^nj ^ ^ SjPj.n.p) ni.Qk,n,j 

3>jn j>jn 

n 

< - E - k)Pj,n,P(Vi)} 2 < E l& - Pi I} 2 • ( H - 13 ) 

i =1 3>jn j>jn 


Hence, since W„ is Gaussian conditional on {V)}” =1 , applying Corollary 2.2.8 in van der 
Vaart and Wellner (1996) and arguing as in (H.9) and (H.10) implies 


E[ sup max max |W n (Qknj 
{/3j}eB n 1 <3<J 1 <k<k n ,j 


OO 

E 

3=3n 


frP3,n,p) l|{^}f =1 ] 


B n M n 


\7lcg (kr 


■Ip- 3/2 

Jn 


(H.14) 
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Thus, results (H.12) and (H.14) together with Markov’s inequality allow us to conclude 


sup ||W„(Proj{/|F^ P })g^|| r . = O p ( kn Bn ^V l °g( k n) ^ 

J n 


(H.15) 


uniformly in P E P. Hence, the claim of the Theorem follows from noting that the 
linearity of W n and W* P and / = Proj{/|P nj p} + Proj{/|P,^p} together imply that 

sup \\Wnfqt-VK,pfqn n \\r < sup ||W n (Proj{/|P n ,p})^-K ) p(Proj{/|P n> p})gM| r 

n n 

+ sup ||W n (Proj{/|P^p})^"[| r + sup IIw* p(Proj{/|P^p})g^ || r , (H.16) 

feT„ feP n 

which in conjunction with results (H. 6 ), (H.ll), and (H.15) conclude the proof. ■ 

Lemma H.l. Let Assumptions 3.1, 3.2(i), H.l(i)-(ii) hold, and {cji}” =1 be i.i.d. with 
Ui ~ N( 0,1) independent of {Vi}f =1 . If j n k n log (j n k n )B n = o(n), then uniformly in P 


SUP ||Wn(<p/3)e -^lAKp^n Hr = O p ( 
3 || 2<D n 


D n Bn kn r (j n log(k n )) 4 (log(j n )) 4 


1 

n 4 


for W* p a linear isonormal process on § n ^p = span{pP p <g> q^ n } independent o/{Vj}" =1 . 

Proof: For notational simplicity, let d n = j n k n , set A P (v) = oA( z ) ^ApAi and 
1 n 

Sn( P) r nA V A d n n A V i)' S n (P) = £p[^>(^p(^)'] • (H.17) 

Ih . 

1=1 

Letting rd, n A v ) denote the d th coordinate of A P (v) further note that ||?~< 2 )n) p||p«> < B n 
since IIpj^.pIIl^ is uniformly bounded by Assumption H.l(ii) and ||<7fc jn j||.Lj? < B n by 
Assumption 3.2(i). Therefore, if for every M > 0 and P E P we define the event 

AnAM) = {|| A1\P) - AI\P)\\o,2 < MR n } , (H.18) 

for R n = {j n k n log(j n /c n )P^/n} 1,/4 , then Assumption H.l(ii) and Lemma H.3 yield 


lim inf lim inf inf P({V)}? = i £ A n p(M)) = 1 . 

M too n—>oo PeP 


(H.19) 


Next, let Afd n E R dn follow a standard normal distribution and be independent 
of {(wj, Pi)}f =1 (defined on the same suitably enlarged probability space). Further let 
AdYAi denote eigenvectors of £ n (P), {A d}d=i represent the corresponding (possibly 
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zero) eigenvalues and define the random variable Z Hi p 6 R rf,i to be given by 


d:\ d =0 

Then note that since W n (r^ n p ) ~ 1V(0, S n (P)) conditional on {F)}f =1 , and M dn is inde¬ 
pendent of {(wj, Vi)}f =1 , Z n) p is Gaussian conditional on {V)}f =1 . Furthermore, 

dn 

EiZ n , P Z' ntP \{Vtf =l ] = E W' d = h n (H.21) 

d = 1 

by direct calculation for 1 ^ the d n x d n identity matrix, and hence Z n P N(0,I dn ) 
conditional on {V )}” =1 almost surely in {Vj }” =1 and is thus independent of {V/}” =1 . 
Moreover, we also note that by Theorem 3.6.1 in Bogachev (1998) and W n (r([ n p ) ~ 
N{0,'E n (P)) conditional on {Vi}^ =l , it follows that W n (r^” p ) belongs to the range of 
t n (P) : R d " -> R d " almost surely in {(wp V))}” =1 . Therefore, since {v d : / 0}j=i 

spans the range of S n (P), we conclude from (H.20) that for any 7 £ R' / " we must have 

l'm\P)^P = 7 ' E ^(^Wn(rJ P )) = W„(Yr£ p ) . (H. 22 ) 

d:A d ^0 

Analogously, we also define for any 7 £ R dn the isonormal process W* P on § nj p by 

K,p(tV£ p ) = 7 'sy 2 (P)Z„,p , (H.23) 

which is trivially independent of {Vi}f =1 due to the independence of Z n> p. Hence, letting 
efc £ R fcn denote the vector whose i th coordinate equals 1 {i = k}, and l{A n] p(M)} be 
an indicator for whether the event {Vi}f =1 £ A n ^p(M) occurs, we obtain that 

sup ||W n (p£^)<^ - W^p(li:M n \\rHA n ,p(M)} 

II/3II 2<D n 

<k X J r x sup max |(e fc ®^)'(Ey 2 (P)-Ey 2 (P))Z ni p|l{A n ,p(M)} . (H.24) 

||/ 3 || 2 <D n 1 < k < k « 

Defining T n = {1,..., k n } X {/3 £ R- 7 ™ : ||/3 || 2 < D n }, next set for any ( k , /3) = t £ T n 

W n , P (i) = |(e fc ® /3)'(Sy 2 (P) - Sy 2 (P))Z n , P |l{A n ,p(M)} , (H.25) 

and observe that conditional on {V^}" =1 , W n ,p(t) is sub-Gaussian under d n (t,t ) = 
||(Sy 2 (P)-Zy 2 (P))(e^/3-e fc ®^)|| 2 for any t = (/c,/3) and t = (fc,/3). Moreover, by 


E • ( H - 2 °) 


= 


E 

d\X d ^0 


Vd X 


(^W n 


n,P 


)) 


\/Xd 


+ 
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standard arguments and definition (H.18), we obtain that under 7f„,p(M) 


\t 1 n / 2 (P)~E 1 n / \P)\\o, 2 D n ^ t MR n D n , 


-,1/2, 


N(e,T n ,d n ) <k n x( 


-) 3n <Kx(- 


(H.26) 


Therefore, noting that sup t ^ eT d n (t, t n ) < 2 MD n R n under the event A Hi p(M), we 
obtain from Corollary 2.2.8 in van der Vaart and Wellner (1996) and (H.26) that 


£[sup |W n ,p(t)||{^KLi] < 

tdzT n 


\/log(iV(e, T n , d n ))de 


r2MD n R n 


< 


(log(fc n ) + j n log( 


MDnR 


n ± ^n\^ A 


)}2de . 


(H.27) 


Hence, exploiting (H.27) and the change of variables u = e/MD n R n we can conclude 


£[su P iw n , P (*)ii{mi] 

tGT n 

r2 ^ ^ 

< MDnRn / {log(fc n ) + j n log(-)}Au < MD n R n X y/\og(k n ) + jn . (H.28) 

Jo u 

Next, for notational simplicity let 5 n = Ay / 7 D n R n ^/\og(k n ) + j n , and then note that 
results (H.24), (H.25), and (H.28) together with Markov’s inequality yield 

P( SUp ||W n(pfcpP)qt -^lA^nWnWr > A*,p(M)) 

WPh <D n 

< P(kl /r sup |W n ,p(A)| > M 2 8 n ) < Ep[-^—E[s up |W n ,p(t)||{Hj? =1 ]] < ^ • 

t£T n n t£.T-n 

(H.29) 


Therefore, combining results (H.19) and (H.29), we can finally conclude that 
lim sup lim sup sup P( sup ||Wn(l/"^)?/ - W* fflnpP)Qn n Hr > M 2 5 n ) 

Mtoo n-»oo PeP W/3\\ 2 <D n 

< limsuplimsup sup{-*- + P({V t }” =1 £ A n , P (M))} = 0 , (H.30) 

Mfoa n —>oo PgP 

which establishes the claim of the Lemma given the definitions of 5 n and R n . ■ 

Lemma H.2. Let £>„ = : fij < M n /pp} for some j n f oo, M n > 0, and 

7 p > 1 , and define the metric \\fi\\i x = J 2 j>j n \Pj\- P° r an V e > 0 it t/ien follows that 

2 /Vf i 4 M 

logiV(e,B n , || ■ H/,) < {(-7—TT 7) 7/>_1 + 1 — Jn V 0} x log(^ + 1) . 

e wp C Jn e 


121 











PROOF: For any {/3j} E B n and integer k > (j n — 1) we first obtain the standard estimate 


“L °SL m r°° L-i'Yp- 1 ) 

Y. l/3jl < Y, < M n / U lp du = M n 7 - _ TT • (H.31) 

j=fc+i j=fc+i J Jk ^' p ’ 

For any a E R, let [a] denote the smallest integer larger than a, and further define 

2A/f i 

m = r ( 7 -V (.in - 1) • (H.32) 

e V7p - 1) 

Then note that (H.31) implies that for any {/3j} E B n we have X^/>j*(e) I A/1 < e/2. 
Hence, letting 7l n (e) = {{/5y} E £> n : (3j = 0 for all j > j*(e)}, we obtain 


iV(e,i3n,Hk)<^(e/2,A(e),Hk) 

< n ^1. I'D < (r^l) b " (<) -'" )v0 . (H.33) 

J = Jn 

where the product should be understood to equal one if j*(e) = j n — 1. Thus, the claim 
of the Lemma follows from the bound [a] < a + 1 and results (LI.32) and (H.33). ■ 

Lemma H.3. Let Assumption 3.1 hold, {fd,n,p}^Li be a triangular array of functions 
fd,n,p ■ R- d " ->• R, and define f d " P (v) = (fi, n ,p(v), • • •, fd n ,n,p(v))' as well as 


X n (P) Epif^Wf^W ] 


Y(p) = -Yfi 


dn 

n.P 




i— 1 


If sup 1<(i<d7i \\fd,n,p\\L™ < C n for all P E P, the eigenvalues of T, n (P) are bounded 
uniformly in n and P E P, and d n \og{d n )C n = o(n), then it follows that 

limsuplimsup sup P(||Sy 2 (P) - E l J 2 (P )|| 0j2 > M { dnlo ^ d ^ C n y/^ = 0 . 

Mfoo >oo pep ’ n 


PROOF: Set Kq so that ||£ n (P)||o ,2 < Kq for all n and P E P, and then note that 


l-{/; 

n 


n.P 


m&AVi)' -s n (p)}ii 0 , 2 < 


d n Cl 


+ K» 

n n 


(H.34) 


almost surely for all P E P since each entry of the matrix f d " p ( V t ) f d,, p {Vj)' is bounded 
by C 2 . Similarly, exploiting \\f dn P {Vi)f dn P {yj)'\\ 0 ^ < d n C 2 almost surely we obtain 


-Ep[{fn>mfn>m' - Sn(P)} 2 ]||o,2 


n 


d n C 2 Kt) K 2 


< 7 ii^p[{/"rp(^)/"rp(^) , } 2 ]ii 0 ,2+ 7 iis 2 (p)ii 0 ,2 < • ( H - 35 ) 

It It lilt 
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Thus, employing results (H.34) and (H.35), together with d n \og(d n )C n = o(n), we 
obtain from Theorem 6.1 in Tropp (2012) (Bernstein’s inequality for matrices) that 


limsuplimsup sup P(||X n (P) - £ n (P )|| 0 ,2 > M 
M^oo n-t oo PeP 


\f dn lo§ ( d n ) C n 


'll 


< lim sup lim sup d n exp{— 

M^oo n->oo 


M 2 d n log(d n )C 2 


n 


-} = 0. (H.36) 


2n (d n C 2 + K 0 )(K 0 + M)- 

Since X n (P) > 0 and X n (P) > 0, Theorem X.1.1 in Bhatia (1997) in turn implies that 
||Sy 2 (P) - T^\P)Wo,2 < ||S n (P) - E n (P)||$ (H.37) 


almost surely, and hence the claim of the Lemma follows from (H.36) and (H.37). 
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